Search Results: "hle"

5 June 2023

Reproducible Builds: Reproducible Builds in May 2023

Welcome to the May 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.


Holger Levsen gave a talk at the 2023 edition of the Debian Reunion Hamburg, a semi-informal meetup of Debian-related people in northern Germany. The slides are available online.
In April, Holger Levsen gave a talk at foss-north 2023 titled Reproducible Builds, the first ten years. Last month, however, Holger s talk was covered in a round-up of the conference on the Free Software Foundation Europe (FSFE) blog.
Pronnoy Goswami, Saksham Gupta, Zhiyuan Li, Na Meng and Daphne Yao from Virginia Tech published a paper investigating the Reproducibility of NPM Packages. The abstract includes:
When using open-source NPM packages, most developers download prebuilt packages on npmjs.com instead of building those packages from available source, and implicitly trust the downloaded packages. However, it is unknown whether the blindly trusted prebuilt NPM packages are reproducible (i.e., whether there is always a verifiable path from source code to any published NPM package). [ ] We downloaded versions/releases of 226 most popularly used NPM packages and then built each version with the available source on GitHub. Next, we applied a differencing tool to compare the versions we built against versions downloaded from NPM, and further inspected any reported difference.
The paper reports that among the 3,390 versions of the 226 packages, only 2,087 versions are reproducible, and furthermore that multiple factors contribute to the non-reproducibility including flexible versioning information in package.json file and the divergent behaviors between distinct versions of tools used in the build process. The paper concludes with insights for future verifiable build procedures. Unfortunately, a PDF is not available publically yet, but a Digital Object Identifier (DOI) is available on the paper s IEEE page.
Elsewhere in academia, Betul Gokkaya, Leonardo Aniello and Basel Halak of the School of Electronics and Computer Science at the University of Southampton published a new paper containing a broad overview of attacks and comprehensive risk assessment for software supply chain security. Their paper, titled Software supply chain: review of attacks, risk assessment strategies and security controls, analyses the most common software supply-chain attacks by providing the latest trend of analyzed attack, and identifies the security risks for open-source and third-party software supply chains. Furthermore, their study introduces unique security controls to mitigate analyzed cyber-attacks and risks by linking them with real-life security incidence and attacks . (arXiv.org, PDF)
NixOS is now tracking two new reports at reproducible.nixos.org. Aside from the collection of build-time dependencies of the minimal and Gnome installation ISOs, this page now also contains reports that are restricted to the artifacts that make it into the image. The minimal ISO is currently reproducible except for Python 3.10, which hopefully will be resolved with the coming update to Python version 3.11.
On our rb-general mailing list this month: David A. Wheeler started a thread noting that the OSSGadget project s oss-reproducible tool was measuring something related to but not the same as reproducible builds. Initially they had adopted the term semantically reproducible build term for what it measured, which they defined as being if its build results can be either recreated exactly (a bit for bit reproducible build), or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases. This generated a significant number of replies, and several were concerned that people might confuse what they were measuring with reproducible builds . After discussion, the OSSGadget developers decided to switch to the term semantically equivalent for what they measured in order to reduce the risk of confusion. Vagrant Cascadian (vagrantc) posted an update about GCC, binutils, and Debian s build-essential set with some progress, some hope, and I daresay, some fears . Lastly, kpcyrd asked a question about building a reproducible Linux kernel package for Arch Linux (answered by Arnout Engelen). In the same, thread David A. Wheeler pointed out that the Linux Kernel documentation has a chapter about Reproducible kernel builds now as well.
In Debian this month, nine reviews of Debian packages were added, 20 were updated and 6 were removed this month, all adding to our knowledge about identified issues. In addition, Vagrant Cascadian added a link to the source code causing various ecbuild issues. [ ]
The F-Droid project updated its Inclusion How-To with a new section explaining why it considers reproducible builds to be best practice and hopes developers will support the team s efforts to make as many (new) apps reproducible as it reasonably can.
In diffoscope development this month, version 242 was uploaded to Debian unstable by Chris Lamb who also made the following changes: In addition, Mattia Rizzolo documented how to (re)-produce a binary blob in the code [ ] and Vagrant Cascadian updated the version of diffoscope in GNU Guix to 242 [ ].
reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Holger Levsen uploaded versions 0.7.24 and 0.7.25 to Debian unstable which added support for Tox versions 3 and 4 with help from Vagrant Cascadian [ ][ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Jason A. Donenfeld filed a bug (now fixed in the latest alpha version) in the Android issue tracker to report that generateLocaleConfig in Android Gradle Plugin version 8.1.0 generates XML files using non-deterministic ordering, breaking reproducible builds. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:
  • Update the kernel configuration of arm64 nodes only put required modules in the initrd to save space in the /boot partition. [ ]
  • A huge number of changes to a new tool to document/track Jenkins node maintenance, including adding --fetch, --help, --no-future and --verbose options [ ][ ][ ][ ] as well as adding a suite of new actions, such as apt-upgrade, command, deploy-git, rmstamp, etc. [ ][ ][ ][ ] in addition a significant amount of refactoring [ ][ ][ ][ ].
  • Issue warnings if apt has updates to install. [ ]
  • Allow Jenkins to run apt get update in maintenance job. [ ]
  • Installed bind9-dnsutils on some Ubuntu 18.04 nodes. [ ][ ]
  • Fixed the Jenkins shell monitor to correctly deal with little-used directories. [ ]
  • Updated the node health check to warn when apt upgrades are available. [ ]
  • Performed some node maintenance. [ ]
In addition, Vagrant Cascadian added the nocheck, nopgo and nolto when building gcc-* and binutils packages [ ] as well as performed some node maintenance [ ][ ]. In addition, Roland Clobus updated the openQA configuration to specify longer timeouts and access to the developer mode [ ] and updated the URL used for reproducible Debian Live images [ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

29 May 2023

Russ Allbery: Book haul

I think this is partial because I also have a stack of other books that I missed recording. At some point, I should stop using this method to track book acquisitions in favor of one of the many programs intended for this purpose, but it's in the long list of other things I really should do one of these days. As usual, I have already read and reviewed a few of these. I might be getting marginally better at reading books shortly after I acquire them? Maybe? Steven Brust Tsalmoth (sff)
C.L. Clark The Faithless (sff)
Oliver Darkshire Once Upon a Tome (non-fiction)
Hernan Diaz Trust (mainstream)
S.B. Divya Meru (sff)
Kate Elliott Furious Heaven (sff)
Steven Flavall Before We Go Live (non-fiction)
R.F. Kuang Babel (sff)
Laurie Marks Dancing Jack (sff)
Arkady Martine Rose/House (sff)
Madeline Miller Circe (sff)
Jenny Odell Saving Time (non-fiction)
Malka Older The Mimicking of Known Successes (sff)
Sabaa Tahir An Ember in the Ashes (sff)
Emily Tesh Some Desperate Glory (sff)
Valerie Valdes Chilling Effect (sff)

6 May 2023

Reproducible Builds: Reproducible Builds in April 2023

Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.

General news Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that? Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start. There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel.

Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]

Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds IRC channel, but no solution appears to be in sight for now.

Community news On our mailing list this month: Holger Levsen gave a talk at foss-north 2023 in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years. Lastly, there were a number of updates to our website, including:
  • Chris Lamb attempted a number of ways to try and fix literal : .lead appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the _data/sponsors.yml file [ ].
  • Holger Levsen added his recent talk [ ], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [ ][ ][ ], reworked the People page a little [ ] [ ], as well as fixed spelling of Arch Linux [ ].
Lastly, Mattia Rizzolo moved some old sponsors to a former section [ ] and Simon Josefsson added Trisquel GNU/Linux. [ ]

Debian
  • Vagrant Cascadian reported on the Debian s build-essential package set, which was inspired by how close we are to making the Debian build-essential set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
  • Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing dinstalls that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
  • 20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new build_path_in_line_annotations_added_by_ruby_ragel toolchain issue. [ ]
  • Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.org has been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope development diffoscope version 241 was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing raise statement that was accidentally dropped in a previous commit. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:
  • Holger Levsen:
    • Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [ ][ ][ ][ ][ ][ ]
    • Add the new APT repo url for Jenkins itself with a new signing key. [ ][ ]
    • In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. [ ]
    • Updated Arch Linux testing to cleanup leftover files left in /tmp/archlinux-ci/ after three days. [ ][ ][ ]
    • Mark a number of nodes hosted by Oregon State University Open Source Lab (OSUOSL) as online and offline. [ ][ ][ ]
    • Update the node health checks to detect failures to end schroot sessions. [ ]
    • Filter out another duplicate contributor from the contributor statistics. [ ]
  • Mattia Rizzolo:



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

24 April 2023

Jonathan Dowland: Separate hledgers

In a previous blog post I described the use of virtual postings to track accidental personal/family expenses. I've always been uncomfortable with that, and in hledger 1yr I outlined a potential scheme for finally addressing the virtual posting problem. separate journals My outline built on top of continuing to maintain both personal and family financial data in the same place, but I've decided that this can't work, because the different "directions" (or signs) of accidental transactions originating from either the family or personal side can't be addressed with any kind of alternate view on the same data. To illustrate with an example. A negative balance in family:liabilities:jon means "family owes jon". A coffee bought by mistake on the family credit card will have a negative posting on the credit card, and thus a positive one on the liabilities account. ("jon owes family"). That's fine. But what about when I buy family stuff on a personal card? The other side of of the transaction is also going to have a positive sign, so it can't be posted to family:liabilities:jon: it would have to go to somewhere else, like jon:liabilities:family. Now I have two accounts which track versions of the same thing, and they cannot be combined with a simple transaction since they're looking at the same value from opposite directions (and signs). Back when I first described the problem I was using a single journal file for all my transactions. After moving to lots of separate journal files (in hledger 1yr), it's become clearer to me that I don't need to maintain the Family and Personal data together, at all: they can be entirely separate journals. getting data between journals When I moved to a new set of ledger files for 2023, I needed to carry forward the balances from 2022 in the form of "opening balance" transactions. This was achieved by a report on the 2022 data, exported as CSV, and imported into the 2023 data (all following the scheme outlined by fully-fledged hledger.)) Separate Personal and Family journals need some information from each other, and I can achieve that in the same way as for opening balances: with an export of the relevant transactions as CSV, then imported on the other side. HLedger's CSV import system is flexible enough that we can effectively invert the sign of liabilities, addressing the problem above. Worked example We start with an accidental coffee purchased on the family card (and so this belongs to the Family ledger)
2022-08-20 coffee
    liabilities:creditcard             -3
    liabilities:jon:expenses:coffee     3
I've encoded the expense category that the Personal ledger will be interested in (the last bit, expenses:coffee) as a sub-account of the liabilities category that the Family ledger is interested in1 (the first bit, liabilities:jon). When viewed on the Family side, the expense category is not interesting, and we can hide it with HLedger's alias feature2:
    alias /^liabilities:jon(.*)$/ = liabilities:jon
It then looks like this from the Family side:
2022-08-20 coffee
    liabilities:creditcard             -3
    liabilities:jon                     3
This transaction (and others like it) are exported via
hledger reg -f family/2023-back.journal liabilities:jon: -O csv \
        jon/import/family/liabilities.csv
(The trailing colon on liabilities:jon: is important here!) In the resulting CSV file, the running example transaction looks like
"55","2022-08-20","","coffee","liabilities:jon:expenses:coffee","  3.00","  3.00"
This is then converted into a journal file by hledger import. The rules file for the import is very simple: the fields date, description, account1 and amount are taken as-is; account2 is hard-coded to liabilities:family. The resulting transaction looks like
2022-08-20 coffee
    liabilities:jon:expenses:coffee     3
    liabilities:family                 -3
Before this journal is included by the main one, we have to adjust the expense account, to remove the liabilities:jon: prefix. The import rules can't do this3 , so we use another journal file as a go-between with another alias rule:
    alias /^liabilities.jon:/ =
This results, finally, in the following transaction in the Personal ledger:
2022-08-20 coffee
    expenses:coffee                     3
    liabilities:family                 -3
avoiding double-counting There's one set of transactions that we don't want to export across this divide, and that's because they're already there: any time I transfer money from myself to the family accounts (or vice versa) to address the accrued debt, the transaction is visible from both my family and personal statements. To avoid exporting these and double-counting them, I make sure those transactions don't post to an account matching the pattern used in the hledger reg report. That's what the trailing colon is for: It ensures I only export transactions which are to a sub-account of liabilities:jon, and not to the root account liabilities:jon itself: which is where I put the repayment transactions. I could instead use a more explicit sub-account like liabilities:jon:repayments or similar, since the trailing colon is quite subtle, but this works for me. Wrap up I've been really on the fence as to whether the complexity of this scheme is worth it to avoid the virtual postings. The previous scheme was much simpler. I have definitely made some mistakes with it, which didn't get caught by the double-entry rules that virtual postings ignore, but they're for small sums of money anyway. On the other hand, a lot of the "machinery" of this already existed for getting opening balances between calendar years, and the gory details are written down and hidden inside the Makefile. I also expect that I will continue to see advantages in having Family and Personal entirely separate, as they can each develop and adapt to their own needs without having to consider the other side of things every time. It's a running experiment, and time will tell if it's a good idea.

  1. This scheme was originally suggested to me by Pranesh on Twitter (described in dues), but I discounted it at the time because of the exact arrangement they suggested, not realising the broader idea might work.
  2. I've hand-waved one problem with using hledger aliases here. If we use them as described, to hide the Personal expense details, we need them to not be applied when performing the CSV-generating report. Therefore, in practise I have them in a front-most family/2023.journal file, which imports the data from another family/2023-back.journal, and the CSV export is performed on the backing journal with the data and not the alias.
  3. HLedger import rules can't manipulate the fields from the CSV a great deal, but one change I proposed and started hacking on would allow for this: to expose Regexp match-groups as interpolatable tokens: https://github.com/simonmichael/hledger/issues/2009.

7 March 2023

Jonathan Dowland: date warping in HLedger

My credit card and bank account rarely agree on the date for when I pay it off1. Since I added balance assertions for bank account transactions, I need the transaction in my ledger to match what the bank thinks, otherwise the balance assertions would start to fail. The skew is not normally more than a couple of days, and could be corrected by changing the date for just one of the two postings. But the skew is not very important, and altering the posting date could be used for something more useful. date warping credit card repayments My credit card bills land halfway through the month, so February's bill covers transactions between January 15th and February 14th. I pay off the bill in full each month using Direct Debit. The credit card company consider the bill paid immediately, but they don't actually draw it until the end of the month (Jan 31 in the running example). This means the payment transaction for a given month lands halfway through the period covered by the next month's bill. The credit card bill itself shows the payment date at the end of the month but presents the transaction "warped" right to the start. This is actually useful, because it means the balance is zero for the first purchase on the bill. The credit card data in CSV form has the repayment transaction at the date it occurred, not warped to the start of the period. When I import this into HLedger, the credit card account balance for each new transaction does not match the statement right up to the point of the repayment, half way through. This makes spot-checking that the imported data matches the statement a bit more awkward. So, I have started "warping" the payment transaction to the start of the billing period, just like the credit card statement does:
2022-12-31  pay credit card
  asset:bank      -500
  liabilities:credit card  ;  date:2022-12-15
I can then spot-check the transactions in HLedger after import, in particular the final one, and the final account balance, and then write a manual balance assertion when I'm finished. I'd quite like to automate the adjusted posting date, too, but I haven't figured out how to do that just yet. date warping for refunds Another thing I've found "date warping" useful is for marrying up refunds with their related purchase. Imagine I spent 200 on some shoes in late January, but returned most of them in early February:
2023-01-25  buy some shoes. hedging on the size
  liabilities:credit card      -200
  expenses:shoes
2023-02-05  return the ones that don't fit
  liabilities:credit card       150
  expenses:shoes
If I look at how much I've spent on shoes per month, it looks odd: 200 in January (although ultimately I only spent 50), and -150 in February.
Balance changes in 2023-01-01..2023-02-28:
$ hledger bal -Mt expenses:shoes
                    Jan     Feb 
================++===============
 expenses:shoes     200    -150 
----------------++---------------
                    200    -150 
By "warping" the refund's posting to the expense account to the purchase date, how much I ultimately spent on shoes is more properly reflected:
2023-02-05  return the ones that don't fit
  liabilities:credit card       150
  expenses:shoes  ; date:2023-01-25
resulting in
$ hledger bal -Mt expenses:shoes
Balance changes in 2023-01-01..2023-02-28:
                   Jan  Feb 
================++===========
 expenses:shoes     50    0 
----------------++-----------
                    50    0 
I suppose whether you'd want to do this is a matter of taste.

  1. Amazon rarely agrees with my bank on when we've paid for things either. For other reasons, Amazon is a beast to tackle in another blog post.

10 February 2023

Jonathan Dowland: HLedger, 1 year on

It's been a year since I started exploring HLedger, and I'm still going. The rollover to 2023 was an opportunity to revisit my approach. Some time ago I stumbled across Dmitry Astapov's HLedger notes (fully-fledged hledger, which I briefly mentioned in eventual consistency) and decided to adopt some of its ideas. new year, new journal First up, Astapov encourages starting a new journal file for a new calendar year. I do this for other, accounting-adjacent files as a matter of course, and I did it for my GNUCash files prior to adopting HLedger. But the reason for those is a general suspicion that a simple mistake with those softwares could irrevocably corrupt my data. I'm much more confident with HLedger, so rolling over at years end isn't necessary for that. But there are other advantages. A quick obvious one is you can get rid of old accounts (such as expense accounts tied to a particular project, now completed). one journal per import In the first year, I periodically imported account data via CSV exports of transactions and HLedger's (excellent) CSV import system. I imported all the transactions, once each, into a single, large journal file. Astapov instead advocates for creating a separate journal for each CSV that you wish to import, and keep around the CSV, leaving you with a 1:1 mapping of CSV:journal. Then use HLedger's "include" mechanism to pull them all into the main journal. With the former approach, where the CSV data was imported precisely, once, it was only exposed to your import rules once. The workflow ended up being: import transactions; notice some that you could have matched with import rules and auto-coded; write the rule for the next time. With Astapov's approach, you can re-generate the journal from the CSV at any point in the future with an updated set of import rules. tracking dependencies Now we get onto the job of driving the generation of all these derivative journal files. Astapov has built a sophisticated system using Haskell's "Shake", which I'm not yet familiar, but for my sins I'm quite adept at (GNU-flavoured) UNIX Make, so I started building with that. An example rule
import/jon/amex/%.journal: import/jon/amex/%.csv rules/amex.csv.rules
    rm -f $(@D)/.latest.$*.csv $@
    hledger import --rules-file rules/amex.csv.rules -f $@ $<
This captures the dependency between the journal and the underlying CSV but also to the relevant rules file; if I modify that, and this target is run in the future, all dependent journals should be re-generated.1 opening balances It's all fine and well starting over in a new year, and I might be generous to forgive debts, but I can't count on others to do the same. We need to carry over some balance information from one year to the next. Astapov has a more complex (or perhaps featureful) scheme for this involving a custom Haskell program, but I bodged something with a pair of make targets:
import/opening/2023.csv: 2022.journal
    mkdir -p import/opening
    hledger bal -f $< \
                $(list_of_accounts_I_want_to_carry_over) \
        -O csv -N > $@
import/opening/2023.journal: import/opening/2023.csv rules/opening.rules
    rm -f $(@D)/.latest.2023.csv $@
    hledger import --rules-file rules/opening.rules \
        -f $@ $<
I think this could be golfed into a year-generic rule with a little more work. The nice thing about this approach is the opening balances for a given year might change, if adjustments are made in prior years. They shouldn't, for real accounts, but very well could for more "virtual" liabilities. (including: deciding to write off debts.) run lots of reports Astapov advocates for running lots of reports, and automatically. There's a really obvious advantage of that to me: there's no chance anyone except me will actually interact with HLedger itself. For family finances, I need reports to be able to discuss anything with my wife. Extending my make rules to run reports is trivial. I've gone for HTML reports for the most part, as they're the easiest on the eye. Unfortunately the most useful report to discuss (at least at the moment) would be a list of transactions in a given expense category, and the register/aregister commands did not support HTML as an output format. I submitted my first HLedger patch to add HTML output support to aregister: https://github.com/simonmichael/hledger/pull/2000 addressing the virtual posting problem I wrote in my original hledger blog post that I had to resort to unbalanced virtual postings in order to record both a liability between my personal cash and family, as well as categorise the spend. I still haven't found a nice way around that. But I suspect having broken out the journal into lots of other journals paves the way to a better solution to the above. The form of a solution I am thinking of is: some scheme whereby the two destination accounts are combined together; perhaps, choose one as a primary and encode the other information in sub-accounts under that. For example, repeating the example from my hledger blog post:
2022-01-02 ZTL*RELISH
    family:liabilities:creditcard        -3.00
    family:dues:jon                       3.00
    (jon:expenses:snacks)                 3.00
This could become
2022-01-02 ZTL*RELISH
    family:liabilities:creditcard        -3.00
    family:liabilities:jon:snacks
(I note this is very similar to a solution proposed to me by someone responding on twitter). The next step is to recognise that sometimes when looking at the data I care about one aspect, and at other times the other, but rarely both. So for the case where I'm thinking about family finances, I could use account aliases to effectively flatten out the expense category portion and ignore it. On the other hand, when I'm concerned about how I've spent my personal cash and not about how much I owe the family account, I could use aliases to do the opposite: rewrite-away the family:liabilities:jon prefix and combine the transactions with the regular jon:expenses account heirarchy. (this is all speculative: I need to actually try this.) catching errors after an import When I import the transactions for a given real bank account, I check the final balance against another source: usually a bank statement, to make sure they agree. I wasn't using any of the myriad methods to make sure that this remains true later on, and so there was the risk that I make an edit to something and accidentally remove a transaction that contributed to that number, and not notice (until the next import). The CSV data my bank gives me for accounts (not for credit cards) also includes a 'resulting balance' field. It was therefore trivial to extend the CSV import rules to add balance assertions to the transactions that are generated. This catches the problem. There are a couple of warts with balance assertions on every such transaction: for example, dealing with the duplicate transaction for paying a credit card: one from the bank statement, one from the credit card. Removing one of the two is sufficient to correct the account balances but sometimes they don't agree on the transaction date, or the transactions within a given day are sorted slightly differently by HLedger than by the bank. The simple solution is to just manually delete one or two assertions: there remain plenty more for assurance. going forward I've only scratched the surface of the suggestions in Astapov's "full fledged HLedger" notes. I'm up to step 2 of 14. I'm expecting to return to it once the changes I've made have bedded in a little bit. I suppose I could anonymize and share the framework (Makefile etc) that I am using if anyone was interested. It would take some work, though, so I don't know when I'd get around to it.

  1. the rm latest bit is to clear up some state-tracking files that HLedger writes to avoid importing duplicate transactions. In this case, I know better than HLedger.

24 January 2023

Bits from Debian: New Debian Developers and Maintainers (November and December 2022)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

30 December 2022

Chris Lamb: Favourite books of 2022: Non-fiction

In my three most recent posts, I went over the memoirs and biographies, classics and fiction books that I enjoyed the most in 2022. But in the last of my book-related posts for 2022, I'll be going over my favourite works of non-fiction. Books that just missed the cut here include Adam Hochschild's King Leopold's Ghost (1998) on the role of Leopold II of Belgium in the Congo Free State, Johann Hari's Stolen Focus (2022) (a personal memoir on relating to how technology is increasingly fragmenting our attention), Amia Srinivasan's The Right to Sex (2021) (a misleadingly named set of philosophic essays on feminism), Dana Heller et al.'s The Selling of 9/11: How a National Tragedy Became a Commodity (2005), John Berger's mindbending Ways of Seeing (1972) and Louise Richardson's What Terrorists Want (2006).

The Great War and Modern Memory (1975)
Wartime: Understanding and Behavior in the Second World War (1989) Paul Fussell Rather than describe the battles, weapons, geopolitics or big personalities of the two World Wars, Paul Fussell's The Great War and Modern Memory & Wartime are focused instead on how the two wars have been remembered by their everyday participants. Drawing on the memoirs and memories of soldiers and civilians along with a brief comparison with the actual events that shaped them, Fussell's two books are a compassionate, insightful and moving piece of analysis. Fussell primarily sets himself against the admixture of nostalgia and trauma that obscures the origins and unimaginable experience of participating in these wars; two wars that were, in his view, a "perceptual and rhetorical scandal from which total recovery is unlikely." He takes particular aim at the dishonesty of hindsight:
For the past fifty years, the Allied war has been sanitised and romanticised almost beyond recognition by the sentimental, the loony patriotic, the ignorant and the bloodthirsty. I have tried to balance the scales. [And] in unbombed America especially, the meaning of the war [seems] inaccessible.
The author does not engage in any of the customary rose-tinted view of war, yet he remains understanding and compassionate towards those who try to locate a reason within what was quite often senseless barbarism. If anything, his despondency and pessimism about the Second World War (the war that Fussell himself fought in) shines through quite acutely, and this is especially the case in what he chooses to quote from others:
"It was common [ ] throughout the [Okinawa] campaign for replacements to get hit before we even knew their names. They came up confused, frightened, and hopeful, got wounded or killed, and went right back to the rear on the route by which they had come, shocked, bleeding, or stiff. They were forlorn figures coming up to the meat grinder and going right back out of it like homeless waifs, unknown and faceless to us, like unread books on a shelf."
It would take a rather heartless reader to fail to be sobered by this final simile, and an even colder one to view Fussell's citation of such an emotive anecdote to be manipulative. Still, stories and cruel ironies like this one infuse this often-angry book, but it is not without astute and shrewd analysis as well, especially on the many qualitative differences between the two conflicts that simply cannot be captured by facts and figures alone. For example:
A measure of the psychological distance of the Second [World] War from the First is the rarity, in 1914 1918, of drinking and drunkenness poems.
Indeed so. In fact, what makes Fussell's project so compelling and perhaps even unique is that he uses these non-quantitive measures to try and take stock of what happened. After all, this was a war conducted by humans, not the abstract school of statistics. And what is the value of a list of armaments destroyed by such-and-such a regiment when compared with truly consequential insights into both how the war affected, say, the psychology of postwar literature ("Prolonged trench warfare, whether enacted or remembered, fosters paranoid melodrama, which I take to be a primary mode in modern writing."), the specific words adopted by combatants ("It is a truism of military propaganda that monosyllabic enemies are easier to despise than others") as well as the very grammar of interaction:
The Field Service Post Card [in WW1] has the honour of being the first widespread exemplary of that kind of document which uniquely characterises the modern world: the "Form". [And] as the first widely known example of dehumanised, automated communication, the post card popularised a mode of rhetoric indispensable to the conduct of later wars fought by great faceless conscripted armies.
And this wouldn't be a book review without argument-ending observations that:
Indicative of the German wartime conception [of victory] would be Hitler and Speer's elaborate plans for the ultimate reconstruction of Berlin, which made no provision for a library.
Our myths about the two world wars possess an undisputed power, in part because they contain an essential truth the atrocities committed by Germany and its allies were not merely extreme or revolting, but their full dimensions (embodied in the Holocaust and the Holodomor) remain essentially inaccessible within our current ideological framework. Yet the two wars are better understood as an abyss in which we were all dragged into the depths of moral depravity, rather than a battle pitched by the forces of light against the forces of darkness. Fussell is one of the few observers that can truly accept and understand this truth and is still able to speak to us cogently on the topic from the vantage point of experience. The Second World War which looms so large in our contemporary understanding of the modern world (see below) may have been necessary and unavoidable, but Fussell convinces his reader that it was morally complicated "beyond the power of any literary or philosophic analysis to suggest," and that the only way to maintain a na ve belief in the myth that these wars were a Manichaean fight between good and evil is to overlook reality. There are many texts on the two World Wars that can either stir the intellect or move the emotions, but Fussell's two books do both. A uniquely perceptive and intelligent commentary; outstanding.

Longitude (1995) Dava Sobel Since Man first decided to sail the oceans, knowing one's location has always been critical. Yet doing so reliably used to be a serious problem if you didn't know where you were, you are far more likely to die and/or lose your valuable cargo. But whilst finding one's latitude (ie. your north south position) had effectively been solved by the beginning of the 17th century, finding one's (east west) longitude was far from trustworthy in comparison. This book first published in 1995 is therefore something of an anachronism. As in, we readily use the GPS facilities of our phones today without hesitation, so we find it difficult to imagine a reality in which knowing something fundamental like your own location is essentially unthinkable. It became clear in the 18th century, though, that in order to accurately determine one's longitude, what you actually needed was an accurate clock. In Longitude, therefore, we read of the remarkable story of John Harrison and his quest to create a timepiece that would not only keep time during a long sea voyage but would survive the rough ocean conditions as well. Self-educated and a carpenter by trade, Harrison made a number of important breakthroughs in keeping accurate time at sea, and Longitude describes his novel breakthroughs in a way that is both engaging and without talking down to the reader. Still, this book covers much more than that, including the development of accurate longitude going hand-in-hand with advancements in cartography as well as in scientific experiments to determine the speed of light: experiments that led to the formulation of quantum mechanics. It also outlines the work being done by Harrison's competitors. 'Competitors' is indeed the correct word here, as Parliament offered a huge prize to whoever could create such a device, and the ramifications of this tremendous financial incentive are an essential part of this story. For the most part, though, Longitude sticks to the story of Harrison and his evolving obsession with his creating the perfect timepiece. Indeed, one reason that Longitude is so resonant with readers is that many of the tropes of the archetypical 'English inventor' are embedded within Harrison himself. That is to say, here is a self-made man pushing against the establishment of the time, with his groundbreaking ideas being underappreciated in his life, or dishonestly purloined by his intellectual inferiors. At the level of allegory, then, I am minded to interpret this portrait of Harrison as a symbolic distillation of postwar Britain a nation acutely embarrassed by the loss of the Empire that is now repositioning itself as a resourceful but plucky underdog; a country that, with a combination of the brains of boffins and a healthy dose of charisma and PR, can still keep up with the big boys. (It is this same search for postimperial meaning I find in the fiction of John le Carr , and, far more famously, in the James Bond franchise.) All of this is left to the reader, of course, as what makes Longitute singularly compelling is its gentle manner and tone. Indeed, at times it was as if the doyenne of sci-fi Ursula K. LeGuin had a sideline in popular non-fiction. I realise it's a mark of critical distinction to downgrade the importance of popular science in favour of erudite academic texts, but Latitude is ample evidence that so-called 'pop' science need not be patronising or reductive at all.

Closed Chambers: The Rise, Fall, and Future of the Modern Supreme Court (1998) Edward Lazarus After the landmark decision by the U.S. Supreme Court in *Dobbs v. Jackson Women's Health Organization that ended the Constitutional right to abortion conferred by Roe v Wade, I prioritised a few books in the queue about the judicial branch of the United States. One of these books was Closed Chambers, which attempts to assay, according to its subtitle, "The Rise, Fall and Future of the Modern Supreme Court". This book is not merely simply a learned guide to the history and functioning of the Court (although it is completely creditable in this respect); it's actually an 'insider' view of the workings of the institution as Lazurus was a clerk for Justice Harry Blackmun during the October term of 1988. Lazarus has therefore combined his experience as a clerk and his personal reflections (along with a substantial body of subsequent research) in order to communicate the collapse in comity between the Justices. Part of this book is therefore a pure history of the Court, detailing its important nineteenth-century judgements (such as Dred Scott which ruled that the Constitution did not consider Blacks to be citizens; and Plessy v. Ferguson which failed to find protection in the Constitution against racial segregation laws), as well as many twentieth-century cases that touch on the rather technical principle of substantive due process. Other layers of Lazurus' book are explicitly opinionated, however, and they capture the author's assessment of the Court's actions in the past and present [1998] day. Given the role in which he served at the Court, particular attention is given by Lazarus to the function of its clerks. These are revealed as being far more than the mere amanuenses they were hitherto believed to be. Indeed, the book is potentially unique in its the claim that the clerks have played a pivotal role in the deliberations, machinations and eventual rulings of the Court. By implication, then, the clerks have plaedy a crucial role in the internal controversies that surround many of the high-profile Supreme Court decisions decisions that, to the outsider at least, are presented as disinterested interpretations of Constitution of the United States. This is of especial importance given that, to Lazarus, "for all the attention we now pay to it, the Court remains shrouded in confusion and misunderstanding." Throughout his book, Lazarus complicates the commonplace view that the Court is divided into two simple right vs. left political factions, and instead documents an ever-evolving series of loosely held but strongly felt series of cabals, quid pro quo exchanges, outright equivocation and pure personal prejudices. (The age and concomitant illnesses of the Justices also appears to have a not insignificant effect on the Court's rulings as well.) In other words, Closed Chambers is not a book that will be read in a typical civics class in America, and the only time the book resorts to the customary breathless rhetoric about the US federal government is in its opening chapter:
The Court itself, a Greek-style temple commanding the crest of Capitol Hill, loomed above them in the dim light of the storm. Set atop a broad marble plaza and thirty-six steps, the Court stands in splendid isolation appropriate to its place at the pinnacle of the national judiciary, one of the three independent and "coequal" branches of American government. Once dubbed the Ivory Tower by architecture critics, the Court has a Corinthian colonnade and massive twenty-foot-high bronze doors that guard the single most powerful judicial institution in the Western world. Lights still shone in several offices to the right of the Court's entrance, and [ ]
Et cetera, et cetera. But, of course, this encomium to the inherent 'nobility' of the Supreme Court is quickly revealed to be a narrative foil, as Lazarus soon razes this dangerously na ve conception to the ground:
[The] institution is [now] broken into unyielding factions that have largely given up on a meaningful exchange of their respective views or, for that matter, a meaningful explication or defense of their own views. It is of Justices who in many important cases resort to transparently deceitful and hypocritical arguments and factual distortions as they discard judicial philosophy and consistent interpretation in favor of bottom-line results. This is a Court so badly splintered, yet so intent on lawmaking, that shifting 5-4 majorities, or even mere pluralities, rewrite whole swaths of constitutional law on the authority of a single, often idiosyncratic vote. It is also a Court where Justices yield great and excessive power to immature, ideologically driven clerks, who in turn use that power to manipulate their bosses and the institution they ostensibly serve.
Lazurus does not put forward a single, overarching thesis, but in the final chapters, he does suggest a potential future for the Court:
In the short run, the cure for what ails the Court lies solely with the Justices. It is their duty, under the shield of life tenure, to recognize the pathologies affecting their work and to restore the vitality of American constitutionalism. Ultimately, though, the long-term health of the Court depends on our own resolve on whom [we] select to join that institution.
Back in 1998, Lazurus might have had room for this qualified optimism. But from the vantage point of 2022, it appears that the "resolve" of the United States citizenry was not muscular enough to meet his challenge. After all, Lazurus was writing before Bush v. Gore in 2000, which arrogated to the judicial branch the ability to decide a presidential election; the disillusionment of Barack Obama's failure to nominate a replacement for Scalia; and many other missteps in the Court as well. All of which have now been compounded by the Trump administration's appointment of three Republican-friendly justices to the Court, including hypocritically appointing Justice Barrett a mere 38 days before the 2020 election. And, of course, the leaking and ruling in Dobbs v. Jackson, the true extent of which has not been yet. Not of a bit of this is Lazarus' fault, of course, but the Court's recent decisions (as well as the liberal hagiographies of 'RBG') most perforce affect one's reading of the concluding chapters. The other slight defect of Closed Chambers is that, whilst it often implies the importance of the federal and state courts within the judiciary, it only briefly positions the Supreme Court's decisions in relation to what was happening in the House, Senate and White House at the time. This seems to be increasingly relevant as time goes on: after all, it seems fairly clear even to this Brit that relying on an activist Supreme Court to enact progressive laws must be interpreted as a failure of the legislative branch to overcome the perennial problems of the filibuster, culture wars and partisan bickering. Nevertheless, Lazarus' book is in equal parts ambitious, opinionated, scholarly and dare I admit it? wonderfully gossipy. By juxtaposing history, memoir, and analysis, Closed Chambers combines an exacting evaluation of the Court's decisions with a lively portrait of the intellectual and emotional intensity that has grown within the Supreme Court's pseudo-monastic environment all while it struggles with the most impactful legal issues of the day. This book is an excellent and well-written achievement that will likely never be repeated, and a must-read for anyone interested in this ever-increasingly important branch of the US government.

Crashed: How a Decade of Financial Crises Changed the World (2018)
Shutdown: How Covid Shook the World's Economy (2021) Adam Tooze The economic historian Adam Tooze has often been labelled as an unlikely celebrity, but in the fourteen years since the global financial crisis of 2008, a growing audience has been looking for answers about the various failures of the modern economy. Tooze, a professor of history at New York's Columbia University, has written much that is penetrative and thought-provoking on this topic, and as a result, he has generated something of a cult following amongst economists, historians and the online left. I actually read two Tooze books this year. The first, Crashed (2018), catalogues the scale of government intervention required to prop up global finance after the 2008 financial crisis, and it characterises the different ways that countries around the world failed to live up to the situation, such as doing far too little, or taking action far too late. The connections between the high-risk subprime loans, credit default swaps and the resulting liquidity crisis in the US in late 2008 is fairly well known today in part thanks to films such as Adam McKay's 2015 The Big Short and much improved economic literacy in media reportage. But Crashed makes the implicit claim that, whilst the specific and structural origins of the 2008 crisis are worth scrutinising in exacting detail, it is the reaction of states in the months and years after the crash that has been overlooked as a result. After all, this is a reaction that has not only shaped a new economic order, it has created one that does not fit any conventional idea about the way the world 'ought' to be run. Tooze connects the original American banking crisis to the (multiple) European debt crises with a larger crisis of liberalism. Indeed, Tooze somehow manages to cover all these topics and more, weaving in Trump, Brexit and Russia's 2014 annexation of Crimea, as well as the evolving role of China in the post-2008 economic order. Where Crashed focused on the constellation of consequences that followed the events of 2008, Shutdown is a clear and comprehensive account of the way the world responded to the economic impact of Covid-19. The figures are often jaw-dropping: soon after the disease spread around the world, 95% of the world's economies contracted simultaneously, and at one point, the global economy shrunk by approximately 20%. Tooze's keen and sobering analysis of what happened is made all the more remarkable by the fact that it came out whilst the pandemic was still unfolding. In fact, this leads quickly to one of the book's few flaws: by being published so quickly, Shutdown prematurely over-praises China's 'zero Covid' policy, and these remarks will make a reader today squirm in their chair. Still, despite the regularity of these references (after all, mentioning China is very useful when one is directly comparing economic figures in early 2021, for examples), these are actually minor blemishes on the book's overall thesis. That is to say, Crashed is not merely a retelling of what happened in such-and-such a country during the pandemic; it offers in effect a prediction about what might be coming next. Whilst the economic responses to Covid averted what could easily have been another Great Depression (and thus showed it had learned some lessons from 2008), it had only done so by truly discarding the economic rule book. The by-product of inverting this set of written and unwritten conventions that have governed the world for the past 50 years, this 'Washington consensus' if you well, has yet to be fully felt. Of course, there are many parallels between these two books by Tooze. Both the liquidity crisis outlined in Crashed and the economic response to Covid in Shutdown exposed the fact that one of the central tenets of the modern economy ie. that financial markets can be trusted to regulate themselves was entirely untrue, and likely was false from the very beginning. And whilst Adam Tooze does not offer a singular piercing insight (conveying a sense of rigorous mastery instead), he may as well be asking whether we're simply going to lurch along from one crisis to the next, relying on the technocrats in power to fix problems when everything blows up again. The answer may very well be yes.

Looking for the Good War: American Amnesia and the Violent Pursuit of Happiness (2021) Elizabeth D. Samet Elizabeth D. Samet's Looking for the Good War answers the following question what would be the result if you asked a professor of English to disentangle the complex mythology we have about WW2 in the context of the recent US exit of Afghanistan? Samet's book acts as a twenty-first-century update of a kind to Paul Fussell's two books (reviewed above), as well as a deeper meditation on the idea that each new war is seen through the lens of the previous one. Indeed, like The Great War and Modern Memory (1975) and Wartime (1989), Samet's book is a perceptive work of demystification, but whilst Fussell seems to have been inspired by his own traumatic war experience, Samet is not only informed by her teaching West Point military cadets but by the physical and ontological wars that have occurred during her own life as well. A more scholarly and dispassionate text is the result of Samet's relative distance from armed combat, but it doesn't mean Looking for the Good War lacks energy or inspiration. Samet shares John Adams' belief that no political project can entirely shed the innate corruptions of power and ambition and so it is crucial to analyse and re-analyse the role of WW2 in contemporary American life. She is surely correct that the Second World War has been universally elevated as a special, 'good' war. Even those with exceptionally giddy minds seem to treat WW2 as hallowed:
It is nevertheless telling that one of the few occasions to which Trump responded with any kind of restraint while he was in office was the 75th anniversary of D-Day in 2019.
What is the source of this restraint, and what has nurtured its growth in the eight decades since WW2 began? Samet posits several reasons for this, including the fact that almost all of the media about the Second World War is not only suffused with symbolism and nostalgia but, less obviously, it has been made by people who have no experience of the events that they depict. Take Stephen Ambrose, author of Steven Spielberg's Band of Brothers miniseries: "I was 10 years old when the war ended," Samet quotes of Ambrose. "I thought the returning veterans were giants who had saved the world from barbarism. I still think so. I remain a hero worshiper." If Looking for the Good War has a primary thesis, then, it is that childhood hero worship is no basis for a system of government, let alone a crusading foreign policy. There is a straight line (to quote this book's subtitle) from the "American Amnesia" that obscures the reality of war to the "Violent Pursuit of Happiness." Samet's book doesn't merely just provide a modern appendix to Fussell's two works, however, as it adds further layers and dimensions he overlooked. For example, Samet provides some excellent insight on the role of Western, gangster and superhero movies, and she is especially good when looking at noir films as a kind of kaleidoscopic response to the Second World War:
Noir is a world ruled by bad decisions but also by bad timing. Chance, which plays such a pivotal role in war, bleeds into this world, too.
Samet rightfully weaves the role of women into the narrative as well. Women in film noir are often celebrated as 'independent' and sassy, correctly reflecting their newly-found independence gained during WW2. But these 'liberated' roles are not exactly a ringing endorsement of this independence: the 'femme fatale' and the 'tart', etc., reflect a kind of conditional freedom permitted to women by a post-War culture which is still wedded to an outmoded honour culture. In effect, far from being novel and subversive, these roles for women actually underwrote the ambient cultural disapproval of women's presence in the workforce. Samet later connects this highly-conditional independence with the liberation of Afghan women, which:
is inarguably one of the more palatable outcomes of our invasion, and the protection of women's rights has been invoked on the right and the left as an argument for staying the course in Afghanistan. How easily consequence is becoming justification. How flattering it will be one day to reimagine it as original objective.
Samet has ensured her book has a predominantly US angle as well, for she ends her book with a chapter on the pseudohistorical Lost Cause of the Civil War. The legacy of the Civil War is still visible in the physical phenomena of Confederate statues, but it also exists in deep-rooted racial injustice that has been shrouded in euphemism and other psychological devices for over 150 years. Samet believes that a key part of what drives the American mythology about the Second World War is the way in which it subconsciously cleanses the horrors of brother-on-brother murder that were seen in the Civil War. This is a book that is not only of interest to historians of the Second World War; it is a work for anyone who wishes to understand almost any American historical event, social issue, politician or movie that has appeared since the end of WW2. That is for better or worse everyone on earth.

28 December 2022

Chris Lamb: Favourite books of 2022: Classics

As a follow-up to yesterday's post detailing my favourite works of fiction from 2022, today I'll be listing my favourite fictional works that are typically filed under classics. Books that just missed the cut here include: E. M. Forster's A Room with a View (1908) and his later A Passage to India (1913), both gently nudged out by Forster's superb Howard's End (see below). Giuseppe Tomasi di Lampedusa's The Leopard (1958) also just missed out on a write-up here, but I can definitely recommend it to anyone interested in reading a modern Italian classic.

War and Peace (1867) Leo Tolstoy It's strange to think that there is almost no point in reviewing this novel: who hasn't heard of War and Peace? What more could possibly be said about it now? Still, when I was growing up, War and Peace was always the stereotypical example of the 'impossible book', and even start it was, at best, a pointless task, and an act of hubris at worst. And so there surely exists a parallel universe in which I never have and will never will read the book... Nevertheless, let us try to set the scene. Book nine of the novel opens as follows:
On the twelfth of June, 1812, the forces of Western Europe crossed the Russian frontier and war began; that is, an event took place opposed to human reason and to human nature. Millions of men perpetrated against one another such innumerable crimes, frauds, treacheries, thefts, forgeries, issues of false money, burglaries, incendiarisms and murders as in whole centuries are not recorded in the annals of all the law courts of the world, but which those who committed them did not at the time regard as being crimes. What produced this extraordinary occurrence? What were its causes? [ ] The more we try to explain such events in history reasonably, the more unreasonable and incomprehensible they become to us.
Set against the backdrop of the Napoleonic Wars and Napoleon's invasion of Russia, War and Peace follows the lives and fates of three aristocratic families: The Rostovs, The Bolkonskys and the Bezukhov's. These characters find themselves situated athwart (or against) history, and all this time, Napoleon is marching ever closer to Moscow. Still, Napoleon himself is essentially just a kind of wallpaper for a diverse set of personal stories touching on love, jealousy, hatred, retribution, naivety, nationalism, stupidity and much much more. As Elif Batuman wrote earlier this year, "the whole premise of the book was that you couldn t explain war without recourse to domesticity and interpersonal relations." The result is that Tolstoy has woven an incredibly intricate web that connects the war, noble families and the everyday Russian people to a degree that is surprising for a book started in 1865. Tolstoy's characters are probably timeless (especially the picaresque adventures and constantly changing thoughts Pierre Bezukhov), and the reader who has any social experience will immediately recognise characters' thoughts and actions. Some of this is at a 'micro' interpersonal level: for instance, take this example from the elegant party that opens the novel:
Each visitor performed the ceremony of greeting this old aunt whom not one of them knew, not one of them wanted to know, and not one of them cared about. The aunt spoke to each of them in the same words, about their health and her own and the health of Her Majesty, who, thank God, was better today. And each visitor, though politeness prevented his showing impatience, left the old woman with a sense of relief at having performed a vexatious duty and did not return to her the whole evening.
But then, some of the focus of the observations are at the 'macro' level of the entire continent. This section about cities that feel themselves in danger might suffice as an example:
At the approach of danger, there are always two voices that speak with equal power in the human soul: one very reasonably tells a man to consider the nature of the danger and the means of escaping it; the other, still more reasonably, says that it is too depressing and painful to think of the danger, since it is not in man s power to foresee everything and avert the general course of events, and it is therefore better to disregard what is painful till it comes and to think about what is pleasant. In solitude, a man generally listens to the first voice, but in society to the second.
And finally, in his lengthy epilogues, Tolstoy offers us a dissertation on the behaviour of large organisations, much of it through engagingly witty analogies. These epilogues actually turn out to be an oblique and sarcastic commentary on the idiocy of governments and the madness of war in general. Indeed, the thorough dismantling of the 'great man' theory of history is a common theme throughout the book:
During the whole of that period [of 1812], Napoleon, who seems to us to have been the leader of all these movements as the figurehead of a ship may seem to a savage to guide the vessel acted like a child who, holding a couple of strings inside a carriage, thinks he is driving it. [ ] Why do [we] all speak of a military genius ? Is a man a genius who can order bread to be brought up at the right time and say who is to go to the right and who to the left? It is only because military men are invested with pomp and power and crowds of sychophants flatter power, attributing to it qualities of genius it does not possess.
Unlike some other readers, I especially enjoyed these diversions into the accounting and workings of history, as well as our narrow-minded way of trying to 'explain' things in a singular way:
When an apple has ripened and falls, why does it fall? Because of its attraction to the earth, because its stalk withers, because it is dried by the sun, because it grows heavier, because the wind shakes it, or because the boy standing below wants to eat it? Nothing is the cause. All this is only the coincidence of conditions in which all vital organic and elemental events occur. And the botanist who finds that the apple falls because the cellular tissue decays and so forth is equally right with the child who stands under the tree and says the apple fell because he wanted to eat it and prayed for it.
Given all of these serious asides, I was also not expecting this book to be quite so funny. At the risk of boring the reader with citations, take this sarcastic remark about the ineptness of medicine men:
After his liberation, [Pierre] fell ill and was laid up for three months. He had what the doctors termed 'bilious fever.' But despite the fact that the doctors treated him, bled him and gave him medicines to drink he recovered.
There is actually a multitude of remarks that are not entirely complimentary towards Russian medical practice, but they are usually deployed with an eye to the human element involved rather than simply to the detriment of a doctor's reputation "How would the count have borne his dearly loved daughter s illness had he not known that it was costing him a thousand rubles?" Other elements of note include some stunning set literary pieces, such as when Prince Andrei encounters a gnarly oak tree under two different circumstances in his life, and when Nat sha's 'Russian' soul is awakened by the strains of a folk song on the balalaika. Still, despite all of these micro- and macro-level happenings, for a long time I felt that something else was going on in War and Peace. It was difficult to put into words precisely what it was until I came across this passage by E. M. Forster:
After one has read War and Peace for a bit, great chords begin to sound, and we cannot say exactly what struck them. They do not arise from the story [and] they do not come from the episodes nor yet from the characters. They come from the immense area of Russia, over which episodes and characters have been scattered, from the sum-total of bridges and frozen rivers, forests, roads, gardens and fields, which accumulate grandeur and sonority after we have passed them. Many novelists have the feeling for place, [but] very few have the sense of space, and the possession of it ranks high in Tolstoy s divine equipment. Space is the lord of War and Peace, not time.
'Space' indeed. Yes, potential readers should note the novel's great length, but the 365 chapters are actually remarkably short, so the sensation of reading it is not in the least overwhelming. And more importantly, once you become familiar with its large cast of characters, it is really not a difficult book to follow, especially when compared to the other Russian classics. My only regret is that it has taken me so long to read this magnificent novel and that I might find it hard to find time to re-read it within the next few years.

Coming Up for Air (1939) George Orwell It wouldn't be a roundup of mine without at least one entry from George Orwell, and, this year, that place is occupied by a book I hadn't haven't read in almost two decades Still, the George Bowling of Coming Up for Air is a middle-aged insurance salesman who lives in a distinctly average English suburban row house with his nuclear family. One day, after winning some money on a bet, he goes back to the village where he grew up in order to fish in a pool he remembers from thirty years before. Less important than the plot, however, is both the well-observed remarks and scathing criticisms that Bowling has of the town he has returned to, combined with an ominous sense of foreboding before the Second World War breaks out. At several times throughout the book, George's placid thoughts about his beloved carp pool are replaced by racing, anxious thoughts that overwhelm his inner peace:
War is coming. In 1941, they say. And there'll be plenty of broken crockery, and little houses ripped open like packing-cases, and the guts of the chartered accountant's clerk plastered over the piano that he's buying on the never-never. But what does that kind of thing matter, anyway? I'll tell you what my stay in Lower Binfield had taught me, and it was this. IT'S ALL GOING TO HAPPEN. All the things you've got at the back of your mind, the things you're terrified of, the things that you tell yourself are just a nightmare or only happen in foreign countries. The bombs, the food-queues, the rubber truncheons, the barbed wire, the coloured shirts, the slogans, the enormous faces, the machine-guns squirting out of bedroom windows. It's all going to happen. I know it - at any rate, I knew it then. There's no escape. Fight against it if you like, or look the other way and pretend not to notice, or grab your spanner and rush out to do a bit of face-smashing along with the others. But there's no way out. It's just something that's got to happen.
Already we can hear psychological madness that underpinned the Second World War. Indeed, there is no great story in Coming Up For Air, no wonderfully empathetic characters and no revelations or catharsis, so it is impressive that I was held by the descriptions, observations and nostalgic remembrances about life in modern Lower Binfield, its residents, and how it has changed over the years. It turns out, of course, that George's beloved pool has been filled in with rubbish, and the village has been perverted by modernity beyond recognition. And to cap it off, the principal event of George's holiday in Lower Binfield is an accidental bombing by the British Royal Air Force. Orwell is always good at descriptions of awful food, and this book is no exception:
The frankfurter had a rubber skin, of course, and my temporary teeth weren't much of a fit. I had to do a kind of sawing movement before I could get my teeth through the skin. And then suddenly pop! The thing burst in my mouth like a rotten pear. A sort of horrible soft stuff was oozing all over my tongue. But the taste! For a moment I just couldn't believe it. Then I rolled my tongue around it again and had another try. It was fish! A sausage, a thing calling itself a frankfurter, filled with fish! I got up and walked straight out without touching my coffee. God knows what that might have tasted of.
Many other tell-tale elements of Orwell's fictional writing are in attendance in this book as well, albeit worked out somewhat less successfully than elsewhere in his oeuvre. For example, the idea of a physical ailment also serving as a metaphor is present in George's false teeth, embodying his constant preoccupation with his ageing. (Readers may recall Winston Smith's varicose ulcer representing his repressed humanity in Nineteen Eighty-Four). And, of course, we have a prematurely middle-aged protagonist who almost but not quite resembles Orwell himself. Given this and a few other niggles (such as almost all the women being of the typical Orwell 'nagging wife' type), it is not exactly Orwell's magnum opus. But it remains a fascinating historical snapshot of the feeling felt by a vast number of people just prior to the Second World War breaking out, as well as a captivating insight into how the process of nostalgia functions and operates.

Howards End (1910) E. M. Forster Howards End begins with the following sentence:
One may as well begin with Helen s letters to her sister.
In fact, "one may as well begin with" my own assumptions about this book instead. I was actually primed to consider Howards End a much more 'Victorian' book: I had just finished Virginia Woolf's Mrs Dalloway and had found her 1925 book at once rather 'modern' but also very much constrained by its time. I must have then unconsciously surmised that a book written 15 years before would be even more inscrutable, and, with its Victorian social mores added on as well, Howards End would probably not undress itself so readily in front of the reader. No doubt there were also the usual expectations about 'the classics' as well. So imagine my surprise when I realised just how inordinately affable and witty Howards End turned out to be. It doesn't have that Wildean shine of humour, of course, but it's a couple of fields over in the English countryside, perhaps abutting the more mordant social satires of the earlier George Orwell novels (see Coming Up for Air above). But now let us return to the story itself. Howards End explores class warfare, conflict and the English character through a tale of three quite different families at the beginning of the twentieth century: the rich Wilcoxes; the gentle & idealistic Schlegels; and the lower-middle class Basts. As the Bloomsbury Group Schlegel sisters desperately try to help the Basts and educate the rich but close-minded Wilcoxes, the three families are drawn ever closer and closer together. Although the whole story does, I suppose, revolve around the house in the title (which is based on the Forster's own childhood home), Howards End is perhaps best described as a comedy of manners or a novel that shows up the hypocrisy of people and society. In fact, it is surprising how little of the story actually takes place in the eponymous house, with the overwhelming majority of the first half of the book taking place in London. But it is perhaps more illuminating to remark that the Howards End of the book is a house that the Wilcoxes who own it at the start of the novel do not really need or want. What I particularly liked about Howards End is how the main character's ideals alter as they age, and subsequently how they find their lives changing in different ways. Some of them find themselves better off at the end, others worse. And whilst it is also surprisingly funny, it still manages to trade in heavier social topics as well. This is apparent in the fact that, although the characters themselves are primarily in charge of their own destinies, their choices are still constrained by the changing world and shifting sense of morality around them. This shouldn't be too surprising: after all, Forster's novel was published just four years before the Great War, a distinctly uncertain time. Not for nothing did Virginia Woolf herself later observe that "on or about December 1910, human character changed" and that "all human relations have shifted: those between masters and servants, husbands and wives, parents and children." This process can undoubtedly be seen rehearsed throughout Forster's Howards End, and it's a credit to the author to be able to capture it so early on, if not even before it was widespread throughout Western Europe. I was also particularly taken by Forster's fertile use of simile. An extremely apposite example can be found in the description Tibby Schlegel gives of his fellow Cambridge undergraduates. Here, Timmy doesn't want to besmirch his lofty idealisation of them with any banal specificities, and wishes that the idea of them remain as ideal Platonic forms instead. Or, as Forster puts it, to Timmy it is if they are "pictures that must not walk out of their frames." Wilde, at his most weakest, is 'just' style, but Forster often deploys his flair for a deeper effect. Indeed, when you get to the end of this section mentioning picture frames, you realise Forster has actually just smuggled into the story a failed attempt on Tibby's part to engineer an anonymous homosexual encounter with another undergraduate. It is a credit to Forster's sleight-of-hand that you don't quite notice what has just happened underneath you and that the books' reticence to honestly describe what has happened is thus structually analogus Tibby's reluctance to admit his desires to himself. Another layer to the character of Tibby (and the novel as a whole) is thereby introduced without the imposition of clumsy literary scaffolding. In a similar vein, I felt very clever noticing the arch reference to Debussy's Pr lude l'apr s-midi d'un faune until I realised I just fell into the trap Forster set for the reader in that I had become even more like Tibby in his pseudo-scholarly views on classical music. Finally, I enjoyed that each chapter commences with an ironic and self-conscious bon mot about society which is only slightly overblown for effect. Particularly amusing are the ironic asides on "women" that run through the book, ventriloquising the narrow-minded views of people like the Wilcoxes. The omniscient and amiable narrator of the book also recalls those ironically distant voiceovers from various French New Wave films at times, yet Forster's narrator seems to have bigger concerns in his mordant asides: Forster seems to encourage some sympathy for all of the characters even the more contemptible ones at their worst moments. Highly recommended, as are Forster's A Room with a View (1908) and his slightly later A Passage to India (1913).

The Good Soldier (1915) Ford Madox Ford The Good Soldier starts off fairly simply as the narrator's account of his and his wife's relationship with some old friends, including the eponymous 'Good Soldier' of the book's title. It's an experience to read the beginning of this novel, as, like any account of endless praise of someone you've never met or care about, the pages of approving remarks about them appear to be intended to wash over you. Yet as the chapters of The Good Soldier go by, the account of the other characters in the book gets darker and darker. Although the author himself is uncritical of others' actions, your own critical faculties are slowgrly brought into play, and you gradully begin to question the narrator's retelling of events. Our narrator is an unreliable narrator in the strict sense of the term, but with the caveat that he is at least is telling us everything we need to know to come to our own conclusions. As the book unfolds further, the narrator's compromised credibility seems to infuse every element of the novel even the 'Good' of the book's title starts to seem like a minor dishonesty, perhaps serving as the inspiration for the irony embedded in the title of The 'Great' Gatsby. Much more effectively, however, the narrator's fixations, distractions and manner of speaking feel very much part of his dissimulation. It sometimes feels like he is unconsciously skirting over the crucial elements in his tale, exactly like one does in real life when recounting a story containing incriminating ingredients. Indeed, just how much the narrator is conscious of his own concealment is just one part of what makes this such an interesting book: Ford Madox Ford has gifted us with enough ambiguity that it is also possible that even the narrator cannot find it within himself to understand the events of the story he is narrating. It was initially hard to believe that such a carefully crafted analysis of a small group of characters could have been written so long ago, and despite being fairly easy to read, The Good Soldier is an almost infinitely subtle book even the jokes are of the subtle kind and will likely get a re-read within the next few years.

Anna Karenina (1878) Leo Tolstoy There are many similar themes running through War and Peace (reviewed above) and Anna Karenina. Unrequited love; a young man struggling to find a purpose in life; a loving family; an overwhelming love of nature and countless fascinating observations about the minuti of Russian society. Indeed, rather than primarily being about the eponymous Anna, Anna Karenina provides a vast panorama of contemporary life in Russia and of humanity in general. Nevertheless, our Anna is a sophisticated woman who abandons her empty existence as the wife of government official Alexei Karenin, a colourless man who has little personality of his own, and she turns to a certain Count Vronsky in order to fulfil her passionate nature. Needless to say, this results in tragic consequences as their (admittedly somewhat qualified) desire to live together crashes against the rocks of reality and Russian society. Parallel to Anna's narrative, though, Konstantin Levin serves as the novel's alter-protagonist. In contrast to Anna, Levin is a socially awkward individual who straddles many schools of thought within Russia at the time: he is neither a free-thinker (nor heavy-drinker) like his brother Nikolai, and neither is he a bookish intellectual like his half-brother Serge. In short, Levin is his own man, and it is generally agreed by commentators that he is Tolstoy's surrogate within the novel. Levin tends to come to his own version of an idea, and he would rather find his own way than adopt any prefabricated view, even if confusion and muddle is the eventual result. In a roughly isomorphic fashion then, he resembles Anna in this particular sense, whose story is a counterpart to Levin's in their respective searches for happiness and self-actualisation. Whilst many of the passionate and exciting passages are told on Anna's side of the story (I'm thinking horse race in particular, as thrilling as anything in cinema ), many of the broader political thoughts about the nature of the working classes are expressed on Levin's side instead. These are stirring and engaging in their own way, though, such as when he joins his peasants to mow the field and seems to enter the nineteenth-century version of 'flow':
The longer Levin mowed, the more often he felt those moments of oblivion during which it was no longer his arms that swung the scythe, but the scythe itself that lent motion to his whole body, full of life and conscious of itself, and, as if by magic, without a thought of it, the work got rightly and neatly done on its own. These were the most blissful moments.
Overall, Tolstoy poses no didactic moral message towards any of the characters in Anna Karenina, and merely invites us to watch rather than judge. (Still, there is a hilarious section that is scathing of contemporary classical music, presaging many of the ideas found in Tolstoy's 1897 What is Art?). In addition, just like the earlier War and Peace, the novel is run through with a number of uncannily accurate observations about daily life:
Anna smiled, as one smiles at the weaknesses of people one loves, and, putting her arm under his, accompanied him to the door of the study.
... as well as the usual sprinkling of Tolstoy's sardonic humour ("No one is pleased with his fortune, but everyone is pleased with his wit."). Fyodor Dostoyevsky, the other titan of Russian literature, once described Anna Karenina as a "flawless work of art," and if you re only going to read one Tolstoy novel in your life, it should probably be this one.

Russell Coker: Links December 2022

Charles Stross wrote an informative summary of the problems with the UK monarchy [1], conveniently before the queen died. The blog post To The Next Mass Shooter, A Modest Proposal is a well written suggestion to potential mass murderers [2]. The New Yorker has an interesting and amusing article about the former CIA employee who released the Vault 7 collection of CIA attack software [3]. This exposes the ridiculously poor hiring practices of the CIA which involved far less background checks than the reporter writing the story did. Wired has an interesting 6 part series about the hunt for Alpha02 the admin of the Alphabay dark web marketplace [4]. The Atlantic has an interesting and informative article about Marjorie Taylor Greene, one of the most horrible politicians in the world [5]. Anarcat wrote a long and detailed blog post about Matrix [6]. It s mostly about comparing Matrix to other services and analysing the overall environment of IM systms. I recommend using Matrix, it is quite good although having a server with SSD storage is required for the database. Edent wrote an interesting thought experiment on how one might try to regain access to all their digital data if a lightning strike destroyed everything in their home [7]. Cory Doctorow wrote an interesting article about the crapification of literary contracts [8]. A lot of this applies to most contracts between corporations and individuals. We need legislation to restrict corporations from such abuse. Jared A Brock wrote an insightful article about why AirBNB is horrible and how it will fail [9]. Habr has an interesting article on circumventing UEFI secure boot [10]. This doesn t make secure boot worthless but does expose some weaknesses in it. Matthew Garrett wrote an interesting blog post about stewartship of the UEFI boot ecosystem and how Microsoft has made some strange and possibly hypocritical decisions about it [11]. It also has a lot of background information on how UEFI can be used and misused. Cory Doctorow wrote an interesting article Let s Make Amazon Into a Dumb Pipe [12]. The idea is to use the Amazon search and reviews to find a product and then buy it elsewhere, a reverse of the showrooming practice where people look at products in stores and buy them online. There is already a browser plugin to search local libraries for Amazon books. Charles Stross wrote an interesting blog post about the UK Tory plan to destroy higher education [13]. There s a lot of similarities to what conservatives are doing in other countries. Antoine Beaupr wrote an insightful blog post How to nationalize the internet in Canada [14]. They cover the technical issues to be addressed as well as some social justice points that are often missed when discussing such issues. Internet is not a luxuary nowadays, it s an important part of daily life and the governments need to treat it the same way as roads and other national infrastructure.

18 December 2022

Ian Jackson: Rust needs #[throws]

tl;dr: Ok-wrapping as needed in today s Rust is a significant distraction, because there are multiple ways to do it. They are all slightly awkward in different ways, so are least-bad in different situations. You must choose a way for every fallible function, and sometimes change a function from one pattern to another. Rust really needs #[throws] as a first-class language feature. Code using #[throws] is simpler and clearer. Please try out withoutboats s fehler. I think you will like it. Contents A recent personal experience in coding style Ever since I read withoutboats s 2020 article about fehler, I have been using it in most of my personal projects. For Reasons I recently had a go at eliminating the dependency on fehler from Hippotat. So, I made a branch, deleted the dependency and imports, and started on the whack-a-mole with the compiler errors. After about a half hour of this, I was starting to feel queasy. After an hour I had decided that basically everything I was doing was making the code worse. And, bizarrely, I kept having to make individual decisons about what idiom to use in each place. I couldn t face it any more. After sleeping on the question I decided that Hippotat would be in Debian with fehler, or not at all. Happily the Debian Rust Team generously helped me out, so the answer is that fehler is now in Debian, so it s fine. For me this experience, of trying to convert Rust-with-#[throws] to Rust-without-#[throws] brought the Ok wrapping problem into sharp focus. What is Ok wrapping? Intro to Rust error handling (You can skip this section if you re already a seasoned Rust programer.) In Rust, fallibility is represented by functions that return Result<SuccessValue, Error>: this is a generic type, representing either whatever SuccessValue is (in the Ok variant of the data-bearing enum) or some Error (in the Err variant). For example, std::fs::read_to_string, which takes a filename and returns the contents of the named file, returns Result<String, std::io::Error>. This is a nice and typesafe formulation of, and generalisation of, the traditional C practice, where a function indicates in its return value whether it succeeded, and errors are indicated with an error code. Result is part of the standard library and there are convenient facilities for checking for errors, extracting successful results, and so on. In particular, Rust has the postfix ? operator, which, when applied to a Result, does one of two things: if the Result was Ok, it yields the inner successful value; if the Result was Err, it returns early from the current function, returning an Err in turn to the caller. This means you can write things like this:
    let input_data = std::fs::read_to_string(input_file)?;
and the error handling is pretty automatic. You get a compiler warning, or a type error, if you forget the ?, so you can t accidentally ignore errors. But, there is a downside. When you are returning a successful outcome from your function, you must convert it into a Result. After all, your fallible function has return type Result<SuccessValue, Error>, which is a different type to SuccessValue. So, for example, inside std::fs::read_to_string, we see this:
        let mut string = String::new();
        file.read_to_string(&mut string)?;
        Ok(string)
     
string has type String; fs::read_to_string must return Result<String, ..>, so at the end of the function we must return Ok(string). This applies to return statements, too: if you want an early successful return from a fallible function, you must write return Ok(whatever). This is particularly annoying for functions that don t actually return a nontrivial value. Normally, when you write a function that doesn t return a value you don t write the return type. The compiler interprets this as syntactic sugar for -> (), ie, that the function returns (), the empty tuple, used in Rust as a dummy value in these kind of situations. A block ( ... ) whose last statement ends in a ; has type (). So, when you fall off the end of a function, the return value is (), without you having to write it. So you simply leave out the stuff in your program about the return value, and your function doesn t have one (i.e. it returns ()). But, a function which either fails with an error, or completes successfuly without returning anything, has return type Result<(), Error>. At the end of such a function, you must explicitly provide the success value. After all, if you just fall off the end of a block, it means the block has value (), which is not of type Result<(), Error>. So the fallible function must end with Ok(()), as we see in the example for std::fs::read_to_string. A minor inconvenience, or a significant distraction? I think the need for Ok-wrapping on all success paths from fallible functions is generally regarded as just a minor inconvenience. Certainly the experienced Rust programmer gets very used to it. However, while trying to remove fehler s #[throws] from Hippotat, I noticed something that is evident in codebases using vanilla Rust (without fehler) but which goes un-remarked. There are multiple ways to write the Ok-wrapping, and the different ways are appropriate in different situations. See the following examples, all taken from a real codebase. (And it s not just me: I do all of these in different places, - when I don t have fehler available - but all these examples are from code written by others.) Idioms for Ok-wrapping - a bestiary Wrap just a returned variable binding If you have the return value in a variable, you can write Ok(reval) at the end of the function, instead of retval.
    pub fn take_until(&mut self, term: u8) -> Result<&'a [u8]>  
        // several lines of code
        Ok(result)
     
If the returned value is not already bound to variable, making a function fallible might mean choosing to bind it to a variable. Wrap a nontrivial return expression Even if it s not just a variable, you can wrap the expression which computes the returned value. This is often done if the returned value is a struct literal:
    fn take_from(r: &mut Reader<'_>) -> Result<Self>  
        // several lines of code
        Ok(AuthChallenge   challenge, methods  )
     
Introduce Ok(()) at the end For functions returning Result<()>, you can write Ok(()). This is usual, but not ubiquitous, since sometimes you can omit it. Wrap the whole body If you don t have the return value in a variable, you can wrap the whole body of the function in Ok( ). Whether this is a good idea depends on how big and complex the body is.
    fn from_str(s: &str) -> std::result::Result<Self, Self::Err>  
        Ok(match s  
            "Authority" => RelayFlags::AUTHORITY,
            // many other branches
            _ => RelayFlags::empty(),
         )
     
Omit the wrap when calling fallible sub-functions If your function wraps another function call of the same return and error type, you don t need to write the Ok at all. Instead, you can simply call the function and not apply ?. You can do this even if your function selects between a number of different sub-functions to call:
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result  
        if flags::unsafe_logging_enabled()  
            std::fmt::Display::fmt(&self.0, f)
          else  
            self.0.display_redacted(f)
         
     
But this doesn t work if the returned error type isn t the same, but needs the autoconversion implied by the ? operator. Convert a fallible sub-function error with Ok( ... ?) If the final thing a function does is chain to another fallible function, but with a different error type, the error must be converted somehow. This can be done with ?.
     fn try_from(v: i32) -> Result<Self, Error>  
         Ok(Percentage::new(v.try_into()?))
      
Convert a fallible sub-function error with .map_err Or, rarely, people solve the same problem by converting explicitly with .map_err:
     pub fn create_unbootstrapped(self) -> Result<TorClient<R>>  
         // several lines of code
         TorClient::create_inner(
             // several parameters
         )
         .map_err(ErrorDetail::into)
      
What is to be done, then? The fehler library is in excellent taste and has the answer. With fehler: fehler provides: This is precisely correct. It is very ergonomic. Consequences include: Limitations of fehler But, fehler is a Rust procedural macro, so it cannot get everything right. Sadly there are some wrinkles. But, Rust-with-#[throws] is so much nicer a language than Rust-with-mandatory-Ok-wrapping, that these are minor inconveniences. Please can we have #[throws] in the Rust language This ought to be part of the language, not a macro library. In the compiler, it would be possible to get the all the corner cases right. It would make the feature available to everyone, and it would quickly become idiomatic Rust throughout the community. It is evident from reading writings from the time, particularly those from withoutboats, that there were significant objections to automatic Ok-wrapping. It seems to have become quite political, and some folks burned out on the topic. Perhaps, now, a couple of years later, we can revisit this area and solve this problem in the language itself ? Explicitness An argument I have seen made against automatic Ok-wrapping, and, in general, against any kind of useful language affordance, is that it makes things less explicit. But this argument is fundamentally wrong for Ok-wrapping. Explicitness is not an unalloyed good. We humans have only limited attention. We need to focus that attention where it is actually needed. So explicitness is good in situtions where what is going on is unusual; or would otherwise be hard to read; or is tricky or error-prone. Generally: explicitness is good for things where we need to direct humans attention. But Ok-wrapping is ubiquitous in fallible Rust code. The compiler mechanisms and type systems almost completely defend against mistakes. All but the most novice programmer knows what s going on, and the very novice programmer doesn t need to. Rust s error handling arrangments are designed specifically so that we can avoid worrying about fallibility unless necessary except for the Ok-wrapping. Explicitness about Ok-wrapping directs our attention away from whatever other things the code is doing: it is a distraction. So, explicitness about Ok-wrapping is a bad thing. Appendix - examples showning code with Ok wrapping is worse than code using #[throws] Observe these diffs, from my abandoned attempt to remove the fehler dependency from Hippotat. I have a type alias AE for the usual error type (AE stands for anyhow::Error). In the non-#[throws] code, I end up with a type alias AR<T> for Result<T, AE>, which I think is more opaque but at least that avoids typing out -> Result< , AE> a thousand times. Some people like to have a local Result alias, but that means that the standard Result has to be referred to as StdResult or std::result::Result.
With fehler and #[throws] Vanilla Rust, Result<>, mandatory Ok-wrapping

Return value clearer, error return less wordy:
impl Parseable for Secret impl Parseable for Secret
#[throws(AE)]
fn parse(s: Option<&str>) -> Self fn parse(s: Option<&str>) -> AR<Self>
let s = s.value()?; let s = s.value()?;
if s.is_empty() throw!(anyhow!( secret value cannot be empty )) if s.is_empty() return Err(anyhow!( secret value cannot be empty ))
Secret(s.into()) Ok(Secret(s.into()))
No need to wrap whole match statement in Ok( ):
#[throws(AE)]
pub fn client<T>(&self, key: & static str, skl: SKL) -> T pub fn client<T>(&self, key: & static str, skl: SKL) -> AR<T>
where T: Parseable + Default where T: Parseable + Default
match self.end Ok(match self.end
LinkEnd::Client => self.ordinary(key, skl)?, LinkEnd::Client => self.ordinary(key, skl)?,
LinkEnd::Server => default(), LinkEnd::Server => default(),
)
Return value and Ok(()) entirely replaced by #[throws]:
impl Display for Loc impl Display for Loc
#[throws(fmt::Error)]
fn fmt(&self, f: &mut fmt::Formatter) fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result
write!(f, :? : , &self.file, self.lno)?; write!(f, :? : , &self.file, self.lno)?;
if let Some(s) = &self.section if let Some(s) = &self.section
write!(f, )?; write!(f, )?;
Ok(())
Call to write! now looks the same as in more complex case shown above:
impl Debug for Secret impl Debug for Secret
#[throws(fmt::Error)]
fn fmt(&self, f: &mut fmt::Formatter) fn fmt(&self, f: &mut fmt::Formatter)-> fmt::Result
write!(f, "Secret(***)")?; write!(f, "Secret(***)")
Much tiresome return Ok() noise removed:
impl FromStr for SectionName impl FromStr for SectionName
type Err = AE; type Err = AE;
#[throws(AE)]
fn from_str(s: &str) -> Self fn from_str(s: &str) ->AR< Self>
match s match s
COMMON => return SN::Common, COMMON => return Ok(SN::Common),
LIMIT => return SN::GlobalLimit, LIMIT => return Ok(SN::GlobalLimit),
_ => _ =>
; ;
if let Ok(n@ ServerName(_)) = s.parse() return SN::Server(n) if let Ok(n@ ServerName(_)) = s.parse() return Ok(SN::Server(n))
if let Ok(n@ ClientName(_)) = s.parse() return SN::Client(n) if let Ok(n@ ClientName(_)) = s.parse() return Ok(SN::Client(n))
if client == LIMIT return SN::ServerLimit(server) if client == LIMIT return Ok(SN::ServerLimit(server))
let client = client.parse().context( client name in link section name )?; let client = client.parse().context( client name in link section name )?;
SN::Link(LinkName server, client ) Ok(SN::Link(LinkName server, client ))
edited 2022-12-18 19:58 UTC to improve, and 2022-12-18 23:28 to fix, formatting


comment count unavailable comments

12 December 2022

Russ Allbery: Review: The Unbroken

Review: The Unbroken, by C.L. Clark
Series: Magic of the Lost #1
Publisher: Orbit
Copyright: March 2021
ISBN: 0-316-54267-9
Format: Kindle
Pages: 490
The Unbroken is the first book of a projected fantasy trilogy. It is C.L. Clark's first novel. Lieutenant Touraine is one of the Sands, the derogatory name for the Balladairan Colonial Brigade. She, like the others of her squad, are conscript soldiers, kidnapped by the Balladairan Empire from their colonies as children and beaten into "civilized" behavior by Balladairan training. They fought in the Balladairan war against the Taargens. Now, they've been reassigned to El-Wast, capital city of Qaz l, the foremost of the southern colonies. The place where Touraine was born, from which she was taken at the age of five. Balladaire is not France and Qaz l is not Algeria, but the parallels are obvious and strongly implied by the map and the climates. Touraine and her squad are part of the forces accompanying Princess Luca, the crown princess of the Balladairan Empire, who has been sent to take charge of Qaz l and quell a rebellion. Luca's parents died in the Withering, the latest round of a recurrent plague that haunts Balladaire. She is the rightful heir, but her uncle rules as regent and is reluctant to give her the throne. Qaz l is where she is to prove herself. If she can bring the colony in line, she can prove that she's ready to rule: her birthright and her destiny. The Qaz li are uninterested in being part of Luca's grand plan of personal accomplishment. She steps off her ship into an assassination attempt, foiled by Touraine's sharp eyes and quick reactions, which brings the Sand to the princess's attention. Touraine's reward is to be assigned the execution of the captured rebels, one of whom recognizes her and names her mother before he dies. This sets up the core of the plot: Qaz li rebellion against an oppressive colonial empire, Luca's attempt to use the colony as a political stepping stone, and Touraine caught in between. One of the reasons why I am happy to see increased diversity in SFF authors is that the way we tell stories is shaped by our cultural upbringing. I was taught to tell stories about colonialism and rebellion in a specific ideological shape. It's hard to describe briefly, but the core idea is that being under the rule of someone else is unnatural as well as being an injustice. It's a deviation from the way the world should work, something unexpected that is inherently unstable. Once people unite to overthrow their oppressors, eventual success is inevitable; it's not only right or moral, it's the natural path of history. This is what you get when you try to peel the supremacy part away from white supremacy but leave the unshakable self-confidence and bedrock assumption that the universe cares what we think. We were also taught that rebellion is primarily ideological. One may be motivated by personal injustice, but the correct use of that injustice is to subsume it into concepts such as freedom and democracy. Those concepts are more "real" in some foundational sense, more central to the right functioning of the world, than individual circumstance. When the now-dominant group tells stories of long-ago revolution, there is no personal experience of oppression and survival in which to ground the story; instead, it's linked to anticipatory fear in the reader, to the idea that one's privileges could be taken away by a foreign oppressor and that the counter to this threat is ideological unity. Obviously, not every white fantasy author uses this story shape, but the tendency runs deep because we're taught it young. You can see it everywhere in fantasy, from Lord of the Rings to Tigana. The Unbroken uses a much different story shape, and I don't think it's a coincidence that the author is Black. Touraine is not sympathetic to the Qaz li. These are not her people and this is not her life. She went through hell in Balladairan schools, but she won a place, however tenuous. Her personal role model is General Cantic, the Balladairan Blood General who was also one of her instructors. Cantic is hard as nails, unforgiving, unbending, and probably a war criminal, but also the embodiment of a military ethic. She is tough but fair with the conscript soldiers. She doesn't put a stop to their harassment by the regular Balladairan troops, but neither does she let it go too far. Cantic has power, she knows how to keep it, and there is a place for Touraine in Cantic's world. And, critically, that place is not just hers: it's one she shares with her squad. Touraine's primary loyalty is not to Balladaire or to Qaz l. It's to the Sands. Her soldiers are neither one thing nor the other, and they disagree vehemently among themselves about what Qaz l and their other colonial homes should be to them, but they learned together, fought together, and died together. That theme is woven throughout The Unbroken: personal bonds, third and fourth loyalties, and practical ethics of survival that complicate and contradict simple dichotomies of oppressor and oppressed. Touraine is repeatedly offered ideological motives that the protagonist in the typical story shape would adopt. And she repeatedly rejects them for personal bonds: trying to keep her people safe, in a world that is not looking out for them. The consequence is that this book tears Touraine apart. She tries to walk a precarious path between Luca, the Qaz li, Cantic, and the Sands, and she falls off that path a lot. Each time I thought I knew where this book was going, there's another reversal, often brutal. I tend to be a happily-ever-after reader who wants the protagonist to get everything they need, so this isn't my normal fare. The amount of hell that Touraine goes through made for difficult reading, worse because much of it is due to her own mistakes or betrayals. But Clark makes those decisions believable given the impossible position Touraine is in and the lack of role models she has for making other choices. She's set up to fail, and the price of small victories is to have no one understand the decisions that she makes, or to believe her motives. Luca is the other viewpoint character of the book (and yes, this is also a love affair, which complicates both of their loyalties). She is the heroine of a more typical genre fantasy novel: the outsider princess with a physical disability and a razor-sharp mind, ambitious but fair (at least in her own mind), with a trusted bodyguard advisor who also knew her father and a sincere desire to be kinder and more even-handed in her governance of the colony. All of this is real; Luca is a protagonist, and the reader is not being set up to dislike her. But compared to Touraine's grappling with identity, loyalty, and ethics, Luca is never in any real danger, and her concerns start to feel too calculated and superficial. It's hard to be seriously invested in whether Luca proves herself or gets her throne when people are being slaughtered and abused. This, I think, is the best part of this book. Clark tells a traditional ideological fantasy of learning to be a good ruler, but she puts it alongside a much deeper and more complex story of multi-faceted oppression. She has the two protagonists fall in love with each other and challenges them to understand each other, and Luca does not come off well in this comparison. Touraine is frustrated, impulsive, physical, and sometimes has catastrophically poor judgment. Luca is analytical and calculating, and in most ways understands the political dynamics far better than Touraine. We know how this story usually goes: Luca sees Touraine's brilliance and lifts her out of the ranks into a role of importance and influence, which Touraine should reward with loyalty. But Touraine's world is more real, more grounded, and more authentic, and both Touraine and the reader know what Luca could offer is contingent and comes with a higher price than Luca understands. (Incidentally, the cover of The Unbroken, designed by Lauren Panepinto with art by Tommy Arnold, is astonishingly good at capturing both Touraine's character and the overall feeling of the book. Here's a larger version.) The writing is good but uneven. Clark loves reversals, and they did keep me reading, but I think there were too many of them. By the end of the book, the escalation of betrayals and setbacks was more exhausting than exciting, and I'd stopped trusting anything good would last. (Admittedly, this is an accurate reflection of how Touraine felt.) Touraine's inner monologue also gets a bit repetitive when she's thrashing in the jaws of an emotional trap. I think some of this is first-novel problems of over-explaining emotional states and character reasoning, but these problems combine to make the book feel a bit over-long. I'm also not in love with the ending. It's perhaps the one place in the book where I am more cynical about the politics than Clark is, although she does lay the groundwork for it. But this book is also full of places small and large where it goes a different direction than most fantasy and is better for it. I think my favorite small moment is Touraine's quiet refusal to defend herself against certain insinuations. This is such a beautiful bit of characterization; she knows she won't be believed anyway, and refuses to demean herself by trying. I'm not sure I can recommend this book unconditionally, since I think you have to be in the mood for it, but it's one of the most thoughtful and nuanced looks at colonialism and rebellion I can remember seeing in fantasy. I found it frustrating in places, but I'm also still thinking about it. If you're looking for a political fantasy with teeth, you could do a lot worse, although expect to come out the other side a bit battered and bruised. Followed by The Faithless, and I have no idea where Clark is going to go with the second book. I suppose I'll have to read and find out. Content note: In addition to a lot of violence, gore, and death, including significant character death, there's also a major plague. If you're not feeling up to reading about panic caused by contageous illness, proceed with caution. Rating: 7 out of 10

23 November 2022

Jonathan Dowland: eventual consistency

Reading some documentation about using hledger, a particular passage jumped out at me:
It should be easy to work towards eventual consistency. I should be able to do them bit by little bit, leaving things half-done, and picking them up later with little (mental) effort. Eventually my records would be perfect and consistent.
This has been an approach I've used for many things in my life for a long time. It has something in common with eat the elephant one mouthful at a time. I think there are some categories of problems that you can't solve this way: perhaps because with this approach you are always stuck in a local minima/maxima. On the other hand I often feel that there's no way I can address some problems at all without doing so in the smallest of steps. Personal finance is definitely one of those.

14 October 2022

Shirish Agarwal: Dowry, Racism, Railways

Dowry Few days back, had posted about the movie Raksha Bandhan and whatever I felt about it. Sadly, just couple of days back, somebody shared this link. Part of me was shocked and part of me was not. Couple of acquaintances of mine in the past had said the same thing for their daughters. And in such situations you are generally left speechless because you don t know what the right thing to do is. If he has shared it with you being an outsider, how many times he must have told the same to their wife and daughters? And from what little I have gathered in life, many people have justified it on similar lines. And while the protests were there, sadly the book was not removed. Now if nurses are reading such literature, how their thought process might be forming, you can tell :(. And these are the ones whom we call for when we are sick and tired :(. And I have not taken into account how the girls/women themselves might be feeling. There are similar things in another country but probably not the same, nor the same motivations though although feeling helplessness in both would be a common thing. But such statements are not alone. Another gentleman in slightly different context shared this as well
The above is a statement shared in a book recommended for CTET (Central Teacher s Eligibility Test that became mandatory to be taken as the RTE (Right To Education) Act came in.). The statement says People from cold places are white, beautiful, well-built, healthy and wise. And people from hot places are black, irritable and of violent nature. Now while I can agree with one part of the statement that people residing in colder regions are more fair than others but there are loads of other factors that determine fairness or skin color/skin pigmentation. After a bit of search came to know that this and similar articulation have been made in an idea/work called Environmental Determinism . Now if you look at that page, you would realize this was what colonialism is and was all about. The idea that the white man had god-given right to rule over others. Similarly, if you are fair, you can lord over others. Seems simplistic, but yet it has a powerful hold on many people in India. Forget the common man, this thinking is and was applicable to some of our better-known Freedom fighters. Pune s own Bal Gangadhar Tilak The Artic Home to the Vedas. It sort of talks about Aryans and how they invaded India and became settled here. I haven t read or have access to the book so have to rely on third-party sources. The reason I m sharing all this is that the right-wing has been doing this myth-making for sometime now and unless and until you put a light on it, it will continue to perpetuate  . For those who have read this blog, do know that India is and has been in casteism from ever. They even took the fair comment and applied it to all Brahmins. According to them, all Brahmins are fair and hence have god-given right to lord over others. What is called the Eton boy s network serves the same in this casteism. The only solution is those idea under limelight and investigate. To take the above, how does one prove that all fair people are wise and peaceful while all people black and brown are violent. If that is so, how does one count for Mahatma Gandhi, Martin Luther King Junior, Nelson Mandela, Michael Jackson the list is probably endless. And not to forget that when Mahatma Gandhiji did his nonviolent movements either in India or in South Africa, both black and brown people in millions took part. Similar examples of Martin Luther King Jr. I know and read of so many non-violent civl movements that took place in the U.S. For e.g. Rosa Parks and the Montgomery Bus Boycott. So just based on these examples, one can conclude that at least the part about the fair having exclusive rights to being fair and noble is not correct. Now as far as violence goes, while every race, every community has had done violence in the past or been a victim of the same. So no one is and can be blameless, although in light of the above statement, the question can argumentated as to who were the Vikings? Both popular imagination and serious history shares stories about Vikings. The Vikings were somewhat nomadic in nature even though they had permanent settlements but even then they went on raids, raped women, captured both men and women and sold them at slaves. So they are what pirates came to be, but not the kind Hollywood romanticizes about. Europe in itself has been a tale in conflict since time immemorial. It is only after the formation of EU that most of these countries stopped fighting each other From a historical point perspective, it is too new. So even the part of fair being non-violent dies in face of this evidence. I could go on but this is enough on that topic.

Railways and Industrial Action around the World. While I have shared about Railways so many times on this blog, it continues to fascinate me that how people don t understand the first things about Railways. For e.g. Railways is a natural monopoly. What that means is and you can look at all and any type of privatization around the world, you will see it is a monopoly. Unlike the road or Skies, Railways is and would always be limited by infrastructure and the ability to have new infrastructure. Unlike in road or Skies (even they have their limits) you cannot run train services on a whim. At any particular point in time, only a single train could and should occupy a stretch of Railway network. You could have more trains on one line, but then the likelihood of front or rear-end collisions becomes a real possibility. You also need all sorts of good and reliable communications, redundant infrastructure so if one thing fails then you have something in place. The reason being a single train can carry anywhere from 2000 to 5000 passengers or more. While this is true of Indian Railways, Railways around the world would probably have some sort of similar numbers.It is in this light that I share the below videos.
To be more precise, see the fuller video
Now to give context to the recording above, Mike Lynch is the general secretary at RMT. For those who came in late, both UK and the U.S. have been threatened by railway strikes. And the reason for the strikes or threat of strikes is similar. Now from the company perspective, all they care is to invest less and make the most profits that can be given to equity shareholders. At the same time, they have freezed the salaries of railway workers for the last 3 years. While the politicians who were asking the questions, apparently gave themselves raise twice this year. They are asking them to negotiate at 8% while inflation in the UK has been 12.3% and projected to go higher. And it is not only the money. Since the 1980s when UK privatized the Railways, they stopped investing in the infrastructure. And that meant that the UK Railway infrastructure over period of time started getting behind and is even behind say Indian Railways which used to provide most bang for the buck. And Indian Railways is far from ideal. Ironically, most of the operators on UK are nationalized Railways of France, Germany etc. but after the hard Brexit, they too are mulling to cut their operations short, they have too  There is also the EU Entry/Exit system that would come next year. Why am I sharing about what is happening in UK Rail, because the Indian Government wants to follow the same thing, and fooling the public into saying we would do it better. What inevitably will happen is that ticket prices go up, people no longer use the service, the number of services go down and eventually they are cancelled. This has happened both in Indian Railways as well as Airlines. In fact, GOI just recently announced a credit scheme just a few days back to help Airlines stay afloat. I was chatting with a friend who had come down to Pune from Chennai and the round-trip cost him INR 15k/- on that single trip alone. We reminisced how a few years ago, 8 years to be precise, we could buy an Air ticket for 2.5k/- just a few days before the trip and did it. I remember doing/experiencing at least a dozen odd trips via air in the years before 2014. My friend used to come to Pune, almost every weekend because he could afford it, now he can t do that. And these are people who are in the above 5-10% of the population. And this is not just in UK, but also in the United States. There is one big difference though, the U.S. is mainly a freight carrier while the UK Railway Operations are mostly passenger based. What was and is interesting that Scotland had to nationalize their services as they realized the Operators cannot or will not function when they were most needed. Most of the public even in the UK seem to want a nationalized rail service, at least their polls say so. So, it would definitely be interesting to see what happens in the UK next year. In the end, I know I promised to share about books, but the above incidents have just been too fascinating to not just share the news but also share what I think about them. Free markets function good where there is competition, for example what is and has been happening in China for EV s but not where you have natural monopolies. In all Railway privatization, you have to handover the area to one person, then they have no motivation. If you have multiple operators, then there would always be haggling as to who will run the train and at what time. In either scenario, it doesn t work and raises prices while not delivering anything better  I do take examples from UK because lot of things are India are still the legacy of the British. The whole civil department that was created in 1953 is/was a copy of the British civil department at that time and it is to this day. P.S. Just came to know that the UK Chancellor Kwasi Kwarteng was just sacked as UK Chancellor. I do commend Truss for facing the press even though she might be dumped a week later unlike our PM who hasn t faced a single press conference in the last 8 odd years.

https://www.youtube.com/watch?v=oTP6ogBqU7of The difference in Indian and UK politics seems to be that the English are now asking questions while here in India, most people are still sleeping without a care in the world. Another thing to note Minidebconf Palakkad is gonna happen 12-13th November 2022. I am probably not gonna go but would request everyone who wants to do something in free software to attend it. I am not sure whether I would be of any use like this and also when I get back, it would be an empty house. But for people young and old, who want to do anything with free/open source software it is a chance not to be missed. Registration of the same closes on 1st of November 2022. All the best, break a leg  Just read this, beautifully done.

29 September 2022

Antoine Beaupr : Detecting manual (and optimizing large) package installs in Puppet

Well this is a mouthful. I recently worked on a neat hack called puppet-package-check. It is designed to warn about manually installed packages, to make sure "everything is in Puppet". But it turns out it can (probably?) dramatically decrease the bootstrap time of Puppet bootstrap when it needs to install a large number of packages.

Detecting manual packages On a cleanly filed workstation, it looks like this:
root@emma:/home/anarcat/bin# ./puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
0 unmanaged packages found
A messy workstation will look like this:
root@curie:/home/anarcat/bin# ./puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
288 unmanaged packages found
apparmor-utils beignet-opencl-icd bridge-utils clustershell cups-pk-helper davfs2 dconf-cli dconf-editor dconf-gsettings-backend ddccontrol ddrescueview debmake debootstrap decopy dict-devil dict-freedict-eng-fra dict-freedict-eng-spa dict-freedict-fra-eng dict-freedict-spa-eng diffoscope dnsdiag dropbear-initramfs ebtables efibootmgr elpa-lua-mode entr eog evince figlet file file-roller fio flac flex font-manager fonts-cantarell fonts-inconsolata fonts-ipafont-gothic fonts-ipafont-mincho fonts-liberation fonts-monoid fonts-monoid-tight fonts-noto fonts-powerline fonts-symbola freeipmi freetype2-demos ftp fwupd-amd64-signed gallery-dl gcc-arm-linux-gnueabihf gcolor3 gcp gdisk gdm3 gdu gedit gedit-plugins gettext-base git-debrebase gnome-boxes gnote gnupg2 golang-any golang-docker-credential-helpers golang-golang-x-tools grub-efi-amd64-signed gsettings-desktop-schemas gsfonts gstreamer1.0-libav gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-pulseaudio gtypist gvfs-backends hackrf hashcat html2text httpie httping hugo humanfriendly iamerican-huge ibus ibus-gtk3 ibus-libpinyin ibus-pinyin im-config imediff img2pdf imv initramfs-tools input-utils installation-birthday internetarchive ipmitool iptables iptraf-ng jackd2 jupyter jupyter-nbextension-jupyter-js-widgets jupyter-qtconsole k3b kbtin kdialog keditbookmarks keepassxc kexec-tools keyboard-configuration kfind konsole krb5-locales kwin-x11 leiningen lightdm lintian linux-image-amd64 linux-perf lmodern lsb-base lvm2 lynx lz4json magic-wormhole mailscripts mailutils manuskript mat2 mate-notification-daemon mate-themes mime-support mktorrent mp3splt mpdris2 msitools mtp-tools mtree-netbsd mupdf nautilus nautilus-sendto ncal nd ndisc6 neomutt net-tools nethogs nghttp2-client nocache npm2deb ntfs-3g ntpdate nvme-cli nwipe obs-studio okular-extra-backends openstack-clients openstack-pkg-tools paprefs pass-extension-audit pcmanfm pdf-presenter-console pdf2svg percol pipenv playerctl plymouth plymouth-themes popularity-contest progress prometheus-node-exporter psensor pubpaste pulseaudio python3-ldap qjackctl qpdfview qrencode r-cran-ggplot2 r-cran-reshape2 rake restic rhash rpl rpm2cpio rs ruby ruby-dev ruby-feedparser ruby-magic ruby-mocha ruby-ronn rygel-playbin rygel-tracker s-tui sanoid saytime scrcpy scrcpy-server screenfetch scrot sdate sddm seahorse shim-signed sigil smartmontools smem smplayer sng sound-juicer sound-theme-freedesktop spectre-meltdown-checker sq ssh-audit sshuttle stress-ng strongswan strongswan-swanctl syncthing system-config-printer system-config-printer-common system-config-printer-udev systemd-bootchart systemd-container tardiff task-desktop task-english task-ssh-server tasksel tellico texinfo texlive-fonts-extra texlive-lang-cyrillic texlive-lang-french texlive-lang-german texlive-lang-italian texlive-xetex tftp-hpa thunar-archive-plugin tidy tikzit tint2 tintin++ tipa tpm2-tools traceroute tree trocla ucf udisks2 unifont unrar-free upower usbguard uuid-runtime vagrant-cachier vagrant-libvirt virt-manager vmtouch vorbis-tools w3m wamerican wamerican-huge wfrench whipper whohas wireshark xapian-tools xclip xdg-user-dirs-gtk xlax xmlto xsensors xserver-xorg xsltproc xxd xz-utils yubioath-desktop zathura zathura-pdf-poppler zenity zfs-dkms zfs-initramfs zfsutils-linux zip zlib1g zlib1g-dev
157 old: apparmor-utils clustershell davfs2 dconf-cli dconf-editor ddccontrol ddrescueview decopy dnsdiag ebtables efibootmgr elpa-lua-mode entr figlet file-roller fio flac flex font-manager freetype2-demos ftp gallery-dl gcc-arm-linux-gnueabihf gcolor3 gcp gdu gedit git-debrebase gnote golang-docker-credential-helpers golang-golang-x-tools gtypist hackrf hashcat html2text httpie httping hugo humanfriendly iamerican-huge ibus ibus-pinyin imediff input-utils internetarchive ipmitool iptraf-ng jackd2 jupyter-qtconsole k3b kbtin kdialog keditbookmarks keepassxc kexec-tools kfind konsole leiningen lightdm lynx lz4json magic-wormhole manuskript mat2 mate-notification-daemon mktorrent mp3splt msitools mtp-tools mtree-netbsd nautilus nautilus-sendto nd ndisc6 neomutt net-tools nethogs nghttp2-client nocache ntpdate nwipe obs-studio openstack-pkg-tools paprefs pass-extension-audit pcmanfm pdf-presenter-console pdf2svg percol pipenv playerctl qjackctl qpdfview qrencode r-cran-ggplot2 r-cran-reshape2 rake restic rhash rpl rpm2cpio rs ruby-feedparser ruby-magic ruby-mocha ruby-ronn s-tui saytime scrcpy screenfetch scrot sdate seahorse shim-signed sigil smem smplayer sng sound-juicer spectre-meltdown-checker sq ssh-audit sshuttle stress-ng system-config-printer system-config-printer-common tardiff tasksel tellico texlive-lang-cyrillic texlive-lang-french tftp-hpa tikzit tint2 tintin++ tpm2-tools traceroute tree unrar-free vagrant-cachier vagrant-libvirt vmtouch vorbis-tools w3m wamerican wamerican-huge wfrench whipper whohas xdg-user-dirs-gtk xlax xmlto xsensors xxd yubioath-desktop zenity zip
131 new: beignet-opencl-icd bridge-utils cups-pk-helper dconf-gsettings-backend debmake debootstrap dict-devil dict-freedict-eng-fra dict-freedict-eng-spa dict-freedict-fra-eng dict-freedict-spa-eng diffoscope dropbear-initramfs eog evince file fonts-cantarell fonts-inconsolata fonts-ipafont-gothic fonts-ipafont-mincho fonts-liberation fonts-monoid fonts-monoid-tight fonts-noto fonts-powerline fonts-symbola freeipmi fwupd-amd64-signed gdisk gdm3 gedit-plugins gettext-base gnome-boxes gnupg2 golang-any grub-efi-amd64-signed gsettings-desktop-schemas gsfonts gstreamer1.0-libav gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-pulseaudio gvfs-backends ibus-gtk3 ibus-libpinyin im-config img2pdf imv initramfs-tools installation-birthday iptables jupyter jupyter-nbextension-jupyter-js-widgets keyboard-configuration krb5-locales kwin-x11 lintian linux-image-amd64 linux-perf lmodern lsb-base lvm2 mailscripts mailutils mate-themes mime-support mpdris2 mupdf ncal npm2deb ntfs-3g nvme-cli okular-extra-backends openstack-clients plymouth plymouth-themes popularity-contest progress prometheus-node-exporter psensor pubpaste pulseaudio python3-ldap ruby ruby-dev rygel-playbin rygel-tracker sanoid scrcpy-server sddm smartmontools sound-theme-freedesktop strongswan strongswan-swanctl syncthing system-config-printer-udev systemd-bootchart systemd-container task-desktop task-english task-ssh-server texinfo texlive-fonts-extra texlive-lang-german texlive-lang-italian texlive-xetex thunar-archive-plugin tidy tipa trocla ucf udisks2 unifont upower usbguard uuid-runtime virt-manager wireshark xapian-tools xclip xserver-xorg xsltproc xz-utils zathura zathura-pdf-poppler zfs-dkms zfs-initramfs zfsutils-linux zlib1g zlib1g-dev
Yuck! That's a lot of shit to go through. Notice how the packages get sorted between "old" and "new" packages. This is because popcon is used as a tool to mark which packages are "old". If you have unmanaged packages, the "old" ones are likely things that you can uninstall, for example. If you don't have popcon installed, you'll also get this warning:
popcon stats not available: [Errno 2] No such file or directory: '/var/log/popularity-contest'
The error can otherwise be safely ignored, but you won't get "help" prioritizing the packages to add to your manifests. Note that the tool ignores packages that were "marked" (see apt-mark(8)) as automatically installed. This implies that you might have to do a little bit of cleanup the first time you run this, as Debian doesn't necessarily mark all of those packages correctly on first install. For example, here's how it looks like on a clean install, after Puppet ran:
root@angela:/home/anarcat# ./bin/puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
127 unmanaged packages found
ca-certificates console-setup cryptsetup-initramfs dbus file gcc-12-base gettext-base grub-common grub-efi-amd64 i3lock initramfs-tools iw keyboard-configuration krb5-locales laptop-detect libacl1 libapparmor1 libapt-pkg6.0 libargon2-1 libattr1 libaudit-common libaudit1 libblkid1 libbpf0 libbsd0 libbz2-1.0 libc6 libcap-ng0 libcap2 libcap2-bin libcom-err2 libcrypt1 libcryptsetup12 libdb5.3 libdebconfclient0 libdevmapper1.02.1 libedit2 libelf1 libext2fs2 libfdisk1 libffi8 libgcc-s1 libgcrypt20 libgmp10 libgnutls30 libgpg-error0 libgssapi-krb5-2 libhogweed6 libidn2-0 libip4tc2 libiw30 libjansson4 libjson-c5 libk5crypto3 libkeyutils1 libkmod2 libkrb5-3 libkrb5support0 liblocale-gettext-perl liblockfile-bin liblz4-1 liblzma5 libmd0 libmnl0 libmount1 libncurses6 libncursesw6 libnettle8 libnewt0.52 libnftables1 libnftnl11 libnl-3-200 libnl-genl-3-200 libnl-route-3-200 libnss-systemd libp11-kit0 libpam-systemd libpam0g libpcre2-8-0 libpcre3 libpcsclite1 libpopt0 libprocps8 libreadline8 libselinux1 libsemanage-common libsemanage2 libsepol2 libslang2 libsmartcols1 libss2 libssl1.1 libssl3 libstdc++6 libsystemd-shared libsystemd0 libtasn1-6 libtext-charwidth-perl libtext-iconv-perl libtext-wrapi18n-perl libtinfo6 libtirpc-common libtirpc3 libudev1 libunistring2 libuuid1 libxtables12 libxxhash0 libzstd1 linux-image-amd64 logsave lsb-base lvm2 media-types mlocate ncurses-term pass-extension-otp puppet python3-reportbug shim-signed tasksel ucf usr-is-merged util-linux-extra wpasupplicant xorg zlib1g
popcon stats not available: [Errno 2] No such file or directory: '/var/log/popularity-contest'
Normally, there should be unmanaged packages here. But because of the way Debian is installed, a lot of libraries and some core packages are marked as manually installed, and are of course not managed through Puppet. There are two solutions to this problem:
  • really manage everything in Puppet (argh)
  • mark packages as automatically installed
I typically chose the second path and mark a ton of stuff as automatic. Then either they will be auto-removed, or will stop being listed. In the above scenario, one could mark all libraries as automatically installed with:
apt-mark auto $(./bin/puppet-package-check   grep -o 'lib[^ ]*')
... but if you trust that most of that stuff is actually garbage that you don't really want installed anyways, you could just mark it all as automatically installed:
apt-mark auto $(./bin/puppet-package-check)
In my case, that ended up keeping basically all libraries (because of course they're installed for some reason) and auto-removing this:
dh-dkms discover-data dkms libdiscover2 libjsoncpp25 libssl1.1 linux-headers-amd64 mlocate pass-extension-otp pass-otp plocate x11-apps x11-session-utils xinit xorg
You'll notice xorg in there: yep, that's bad. Not what I wanted. But for some reason, on other workstations, I did not actually have xorg installed. Turns out having xserver-xorg is enough, and that one has dependencies. So now I guess I just learned to stop worrying and live without X(org).

Optimizing large package installs But that, of course, is not all. Why make things simple when you can have an unreadable title that is trying to be both syntactically correct and click-baity enough to flatter my vain ego? Right. One of the challenges in bootstrapping Puppet with large package lists is that it's slow. Puppet lists packages as individual resources and will basically run apt install $PKG on every package in the manifest, one at a time. While the overhead of apt is generally small, when you add things like apt-listbugs, apt-listchanges, needrestart, triggers and so on, it can take forever setting up a new host. So for initial installs, it can actually makes sense to skip the queue and just install everything in one big batch. And because the above tool inspects the packages installed by Puppet, you can run it against a catalog and have a full lists of all the packages Puppet would install, even before I even had Puppet running. So when reinstalling my laptop, I basically did this:
apt install puppet-agent/experimental
puppet agent --test --noop
apt install $(./puppet-package-check --debug \
    2>&1   grep ^puppet\ packages 
      sed 's/puppet packages://;s/ /\n/g'
      grep -v -e onionshare -e golint -e git-sizer -e github-backup -e hledger -e xsane -e audacity -e chirp -e elpa-flycheck -e elpa-lsp-ui -e yubikey-manager -e git-annex -e hopenpgp-tools -e puppet
) puppet-agent/experimental
That massive grep was because there are currently a lot of packages missing from bookworm. Those are all packages that I have in my catalog but that still haven't made it to bookworm. Sad, I know. I eventually worked around that by adding bullseye sources so that the Puppet manifest actually ran. The point here is that this improves the Puppet run time a lot. All packages get installed at once, and you get a nice progress bar. Then you actually run Puppet to deploy configurations and all the other goodies:
puppet agent --test
I wish I could tell you how much faster that ran. I don't know, and I will not go through a full reinstall just to please your curiosity. The only hard number I have is that it installed 444 packages (which exploded in 10,191 packages with dependencies) in a mere 10 minutes. That might also be with the packages already downloaded. In any case, I have that gut feeling it's faster, so you'll have to just trust my gut. It is, after all, much more important than you might think.

Similar work The blueprint system is something similar to this:
It figures out what you ve done manually, stores it locally in a Git repository, generates code that s able to recreate your efforts, and helps you deploy those changes to production
That tool has unfortunately been abandoned for a decade at this point. Also note that the AutoRemove::RecommendsImportant and AutoRemove::SuggestsImportant are relevant here. If it is set to true (the default), a package will not be removed if it is (respectively) a Recommends or Suggests of another package (as opposed to the normal Depends). In other words, if you want to also auto-remove packages that are only Suggests, you would, for example, add this to apt.conf:
AutoRemove::SuggestsImportant false;
Paul Wise has tried to make the Debian installer and debootstrap properly mark packages as automatically installed in the past, but his bug reports were rejected. The other suggestions in this section are also from Paul, thanks!

25 August 2022

Jonathan Dowland: dues (or blues)

After I wrote hledger, I got some good feedback, both from a friend in-person and also on Twitter. My in-person friend asked, frankly, do I really try to manage money like this: tracking every single expense? Which affirms my suspicion, that many people don't, and that it perhaps isn't essential to do so. Combined with the details below, 3/4 of the way through my experiment with using hledger, I'm not convinced that it has been a good idea. I'm quoting my Twitter feedback here in order to respond. The context is handling when I have used the "wrong" card to pay for something: a card affiliated with my family expenses for something personal, or vice versa. With double-entry book-keeping, and one pair of transactions, the destination account can either record the expense category:
2022-08-20  coffee
    family:liabilities:creditcard     -3
    jon:expenses:coffee                3
or the fact it was paid for on the wrong card
2022-08-20  coffee
    family:liabilities:creditcard     -3
    family:liabilities:jon             3 ; jon owes family
but not easily both. https://twitter.com/pranesh/status/1516819846431789058:
When you accidentally use the family CV for personal expenses, credit the account "family:liabilities:creditcard:jon" instead of "family:liabilities:creditcard". That'll allow you to track w/ 2 postings.
This is an interesting idea: create a sub-account underneath the credit card, and I would have a separate balance representing the money I owed. Before:
$ hledger bal -t
              -3  family:liabilities:creditcard
               3  jon:expenses:coffee
transaction
2022-08-20  coffee
    family:liabilities:creditcard:jon     -3
    jon:expenses:coffee                    3
Corresponding balances
$ hledger bal -t
              -3  family:liabilities:creditcard
              -3    jon
               3  jon:expenses:coffee
Great. However, what process would clear the balance on that sub-account? In practice, I don't make a separate, explicit payment to the credit card from my personal accounts. It's paid off in full by direct debit from the family shared account. In practice, such dues are accumulated and settled with one off bank transfers, now and then. Since the sub-account is still part of the credit card heirarchy, I can't just use a set of virtual postings to consolidate that value with other liabilities, or cover it. Any transaction in there which did not correspond to a real transaction on the credit card would make the balance drift away from the real-word credit statements. The only way I could see this working would be if the direct debit that settles the credit card was artificially split to clear the sub-account, and then the amount owed would be lost. https://twitter.com/pranesh/status/1516819846431789058:
Else, add: family:assets:receivable:jon $3
jon:liabilities:family:cc $-3
A "receivable" account would function like the "dues" accounts I described in hledger (except "receivable" is an established account type in double-entry book-keeping). Here I think Pranesh is proposing using these two accounts in addition to the others on a posting. E.g.
2022-08-20  coffee
    family:liabilities:creditcard     -3
    jon:expenses:coffee                3
    family:assets:receivable:jon       3
    jon:liabilities:family            -3
This balances, and we end up with two other accounts, which are tracking the exact same thing. I only owe 3, but if you didn't know that the accounts were "views" onto the same thing, you could mistakenly think I owed 6. I can't see the advantage of this over just using a virtual, unbalanced posting. Dues, Liabilities I'd invented accounts called "dues" to track moneys owed. The more correct term for this in accounting parlance would be "accounts receivable", as in one of the examples above. I could instead be tracking moneys due; this is a classic liability. Liabilities have negative balances.
jon:liabilities:family    -3
This means, I owe the family 3. Liability accounts like that are identical to "dues" accounts. A positive balance in a Liability is a counter-intuitive way of describing moneys owed to me, rather than by me. And, reviewing a lot of the coding I did this year, I've got myself hopelessly confused with the signs, and made lots of errors. Crucially, double-entry has not protected me from making those mistakes: of course, I'm circumventing it by using unbalanced virtual postings in many cases (although I was not consistent in where I did this), but even if I used a pair of accounts as in the last example above, I could still screw it up.

14 August 2022

Russ Allbery: Review: Still Not Safe

Review: Still Not Safe, by Robert L. Wears & Kathleen M. Sutcliffe
Publisher: Oxford University Press
Copyright: November 2019
ISBN: 0-19-027128-0
Format: Kindle
Pages: 232
Still Not Safe is an examination of the recent politics and history of patient safety in medicine. Its conclusions are summarized by the opening paragraph of the preface:
The American moral and social philosopher Eric Hoffer reportedly said that every great cause begins as a movement, becomes a business, and eventually degenerates into a racket. The reform movement to make healthcare safer is clearly a great cause, but patient safety efforts are increasingly following Hoffer's path.
Robert Wears was Professor of Emergency Medicine at the University of Florida specializing in patient safety. Kathleen Sutcliffe is Professor of Medicine and Business at Johns Hopkins. This book is based on research funded by a grant from the Robert Wood Johnson Foundation, for which both Wears and Sutcliffe were primary investigators. (Wears died in 2017, but the acknowledgments imply that at least early drafts of the book existed by that point and it was indeed co-written.) The anchor of the story of patient safety in Still Not Safe is the 1999 report from the Institute of Medicine entitled To Err is Human, to which the authors attribute an explosion of public scrutiny of medical safety. The headline conclusion of that report, which led nightly news programs after its release, was that 44,000 to 120,000 people died each year in the United States due to medical error. This report prompted government legislation, funding for new safety initiatives, a flurry of follow-on reports, and significant public awareness of medical harm. What it did not produce, in the authors' view, is significant improvements in patient safety. The central topic of this book is an analysis of why patient safety efforts have had so little measurable effect. The authors attribute this to three primary causes: an unwillingness to involve safety experts from outside medicine or absorb safety lessons from other disciplines, an obsession with human error that led to profound misunderstandings of the nature of safety, and the misuse of safety concerns as a means to centralize control of medical practice in the hands of physician-administrators. (The term used by the authors is "managerial, scientific-bureaucratic medicine," which is technically accurate but rather awkward.) Biggest complaint first: This book desperately needed examples, case studies, or something to make these ideas concrete. There are essentially none in 230 pages apart from passing mentions of famous cases of medical error that added to public pressure, and a tantalizing but maddeningly nonspecific discussion of the atypically successful effort to radically improve the safety of anesthesia. Apparently anesthesiologists involved safety experts from outside medicine, avoided a focus on human error, turned safety into an engineering problem, and made concrete improvements that had a hugely positive impact on the number of adverse events for patients. Sounds fascinating! Alas, I'm just as much in the dark about what those improvements were as I was when I started reading this book. Apart from a vague mention of some unspecified improvements to anesthesia machines, there are no concrete descriptions whatsoever. I understand that the authors were probably leery of giving too many specific examples of successful safety initiatives since one of their core points is that safety is a mindset and philosophy rather than a replicable set of actions, and copying the actions of another field without understanding their underlying motivations or context within a larger system is doomed to failure. But you have to give the reader something, or the book starts feeling like a flurry of abstract assertions. Much is made here of the drawbacks of a focus on human error, and the superiority of the safety analysis done in other fields that have moved beyond error-centric analysis (and in some cases have largely discarded the word "error" as inherently unhelpful and ambiguous). That leads naturally to showing an analysis of an adverse incident through an error lens and then through a more nuanced safety lens, making the differences concrete for the reader. It was maddening to me that the authors never did this. This book was recommended to me as part of a discussion about safety and reliability in tech and the need to learn from safety practices in other fields. In that context, I didn't find it useful, although surprisingly that's because the thinking in medicine (at least as presented by these authors) seems behind the current thinking in distributed systems. The idea that human error is not a useful model for approaching reliability is standard in large tech companies, nearly all of which use blameless postmortems for exactly that reason. Tech, similar to medicine, does have a tendency to be insular and not look outside the field for good ideas, but the approach to large-scale reliability in tech seems to have avoided the other traps discussed here. (Security is another matter, but security is also adversarial, which creates different problems that I suspect require different tools.) What I did find fascinating in this book, although not directly applicable to my own work, is the way in which a focus on human error becomes a justification for bureaucratic control and therefore a concentration of power in a managerial layer. If the assumption is that medical harm is primarily caused by humans making avoidable mistakes, and therefore the solution is to prevent humans from making mistakes through better training, discipline, or process, this creates organizations that are divided into those who make the rules and those who follow the rules. The long-term result is a practice of medicine in which a small number of experts decide the correct treatment for a given problem, and then all other practitioners are expected to precisely follow that treatment plan to avoid "errors." (The best distributed systems approaches may avoid this problem, but this failure mode seems nearly universal in technical support organizations.) I was startled by how accurate that portrayal of medicine felt. My assumption prior to reading this book was that the modern experience of medicine as an assembly line with patients as widgets was caused by the pressure for higher "productivity" and thus shorter visit times, combined with (in the US) the distorting effects of our broken medical insurance system. After reading this book, I've added a misguided way of thinking about medical error and risk avoidance to that analysis. One of the authors' points (which, as usual, I wish they'd made more concrete with a case study) is that the same thought process that lets a doctor make a correct diagnosis and find a working treatment is the thought process that may lead to an incorrect diagnosis or treatment. There is not a separable state of "mental error" that can be eliminated. Decision-making processes are more complicated and more integrated than that. If you try to prevent "errors" by eliminating flexibility, you also eliminate vital tools for successfully treating patients. The authors are careful to point out that the prior state of medicine in which each doctor was a force to themselves and there was no role for patient safety as a discipline was also bad for safety. Reverting to the state of medicine before the advent of the scientific-bureaucratic error-avoiding culture is also not a solution. But, rather at odds with other popular books about medicine, the authors are highly critical of safety changes focused on human error prevention, such as mandatory checklists. In their view, this is exactly the sort of attempt to blindly copy the machinery of safety in another field (in this case, air travel) without understanding the underlying purpose and system of which it's a part. I am not qualified to judge the sharp dispute over whether there is solid clinical evidence that checklists are helpful (these authors claim there is not; I know other books make different claims, and I suspect it may depend heavily on how the checklist is used). But I found the authors' argument that one has to design systems holistically for safety, not try to patch in safety later by turning certain tasks into rote processes and humans into machines, to be persuasive. I'm not willing to recommend this book given how devoid it is of concrete examples. I was able to fill in some of that because of prior experience with the literature on site reliability engineering, but a reader who wasn't previously familiar with discussions of safety or reliability may find much of this book too abstract to be comprehensible. But I'm not sorry I read it. I hadn't previously thought about the power dynamics of a focus on error, and I think that will be a valuable observation to keep in mind. Rating: 6 out of 10

4 July 2022

Russ Allbery: Review: She Who Became the Sun

Review: She Who Became the Sun, by Shelley Parker-Chan
Series: Radiant Emperor #1
Publisher: Tor
Copyright: 2021
Printing: 2022
ISBN: 1-250-62179-8
Format: Kindle
Pages: 414
In 1345 in Zhongli village, in fourth year of a drought, lived a man with his son and his daughter, the last surviving of seven children. The son was promised by his father to the Wuhuang Monastery on his twelfth birthday if he survived. According to the fortune-teller, that son, Zhu Chongba, will be so great that he will bring a hundred generations of pride to the family name. When the girl dares ask her fate, the fortune-teller says, simply, "Nothing." Bandits come looking for food and kill their father. Zhu goes catatonic rather than bury his father, so the girl digs a grave, only to find her brother dead inside it with her father. It leaves her furious: he had a great destiny and he gave it up without a fight, choosing to become nothing. At that moment, she decides to seize his fate for her own, to become Zhu so thoroughly that Heaven itself will be fooled. Through sheer determination and force of will, she stays at the gates of Wuhuang Monastery until the monks are impressed enough with her stubbornness that they let her in under Zhu's name. That puts her on a trajectory that will lead her to the Red Turbans and the civil war over the Mandate of Heaven. She Who Became the Sun is historical fiction with some alternate history and a touch of magic. The closest comparison I can think of is Guy Gavriel Kay: a similar touch of magic that is slight enough to have questionable impact on the story, and a similar starting point of history but a story that's not constrained to follow the events of our world. Unlike Kay, Parker-Chan doesn't change the names of places and people. It's therefore not difficult to work out the history this story is based on (late Yuan dynasty), although it may not be clear at first what role Zhu will play in that history. The first part of the book focuses on Zhu, her time in the monastery, and her (mostly successful) quest to keep her gender secret. The end of that part introduces the second primary protagonist, the eunuch general Ouyang of the army of the Prince of Henan. Ouyang is Nanren, serving a Mongol prince or, more precisely, his son Esen. On the surface, Ouyang is devoted to Esen and serves capably as his general. What lies beneath that surface is far darker and more complicated. I think how well you like this book will depend on how well you get along with the characters. I thought Zhu was a delight. She spends the first half of the book proving herself to be startlingly competent and unpredictable while outwitting Heaven and pursuing her assumed destiny. A major hinge event at the center of the book could have destroyed her character, but instead makes her even stronger, more relaxed, and more comfortable with herself. Her story's exploration of gender identity only made that better for me, starting with her thinking of herself as a woman pretending to be a man and turning into something more complex and self-chosen (and, despite some sexual encounters, apparently asexual, which is something you still rarely see in fiction). I also appreciated how Parker-Chan varies Zhu's pronouns depending on the perspective of the narrator. That said, Zhu is not a good person. She is fiercely ambitious to the point of being a sociopath, and the path she sees involves a lot of ruthlessness and some cold-blooded murder. This is less of a heroic journey than a revenge saga, where the target of revenge is the entire known world and Zhu is as dangerous as she is competent. If you want your protagonist to be moral, this may not work for you. Zhu's scenes are partly told from her perspective and partly from the perspective of a woman named Ma who is a good person, and who is therefore intermittently horrified. The revenge story worked for me, and as a result I found Ma somewhat irritating. If your tendency is to agree with Ma, you may find Zhu too amoral to root for. Ouyang's parts I just hated, which is fitting because Ouyang loathes himself to a degree that is quite difficult to read. He is obsessed with being a eunuch and therefore not fully male. That internal monologue is disturbing enough that it drowned out the moderately interesting court intrigue that he's a part of. I know some people like reading highly dramatic characters who are walking emotional disaster zones. I am not one of those people; by about three quarters of the way through the book I was hoping someone would kill Ouyang already and put him out of everyone's misery. One of the things I disliked about this book is that, despite the complex gender work with Zhu, gender roles within the story have a modern gloss while still being highly constrained. All of the characters except Zhu (and the monk Xu, who has a relatively minor part but is the most likable character in the book) feel like they're being smothered in oppressive gender expectations. Ouyang has a full-fledged case of toxic masculinity to fuel his self-loathing, which Parker-Chan highlights with some weirdly disturbing uses of BDSM tropes. So, I thought this was a mixed bag, and I suspect reactions will differ. I thoroughly enjoyed Zhu's parts despite her ruthlessness and struggled through Ouyang's parts with a bad taste in my mouth. I thought the pivot Parker-Chan pulls off in the middle of the book with Zhu's self-image and destiny was beautifully done and made me like the character even more, but I wish the conflict between Ma's and Zhu's outlooks hadn't been so central. Because of that, the ending felt more tragic than triumphant, which I think was intentional but which wasn't to my taste. As with Kay's writing, I suspect there will be some questions about whether She Who Became the Sun is truly fantasy. The only obvious fantastic element is the physical manifestation of the Mandate of Heaven, and that has only a minor effect on the plot. And as with Kay, I think this book needed to be fantasy, not for the special effects, but because it needs the space to take fate literally. Unlike Kay, Parker-Chan does not use the writing style of epic fantasy, but Zhu's campaign to assume a destiny which is not her own needs to be more than a metaphor for the story to work. I enjoyed this with some caveats. For me, the Zhu portions made up for the Ouyang portions. But although it's clearly the first book of a series, I'm not sure I'll read on. I felt like Zhu's character arc reached a satisfying conclusion, and the sequel seems likely to be full of Ma's misery over ethical conflicts and more Ouyang, neither of which sound appealing. So far as I can tell, the sequel I assume is coming has not yet been announced. Rating: 7 out of 10

17 June 2022

Antoine Beaupr : Matrix notes

I have some concerns about Matrix (the protocol, not the movie that came out recently, although I do have concerns about that as well). I've been watching the project for a long time, and it seems more a promising alternative to many protocols like IRC, XMPP, and Signal. This review may sound a bit negative, because it focuses on those concerns. I am the operator of an IRC network and people keep asking me to bridge it with Matrix. I have myself considered just giving up on IRC and converting to Matrix. This space is a living document exploring my research of that problem space. The TL;DR: is that no, I'm not setting up a bridge just yet, and I'm still on IRC. This article was written over the course of the last three months, but I have been watching the Matrix project for years (my logs seem to say 2016 at least). The article is rather long. It will likely take you half an hour to read, so copy this over to your ebook reader, your tablet, or dead trees, and lean back and relax as I show you around the Matrix. Or, alternatively, just jump to a section that interest you, most likely the conclusion.

Introduction to Matrix Matrix is an "open standard for interoperable, decentralised, real-time communication over IP. It can be used to power Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere you need a standard HTTP API for publishing and subscribing to data whilst tracking the conversation history". It's also (when compared with XMPP) "an eventually consistent global JSON database with an HTTP API and pubsub semantics - whilst XMPP can be thought of as a message passing protocol." According to their FAQ, the project started in 2014, has about 20,000 servers, and millions of users. Matrix works over HTTPS but over a special port: 8448.

Security and privacy I have some concerns about the security promises of Matrix. It's advertised as a "secure" with "E2E [end-to-end] encryption", but how does it actually work?

Data retention defaults One of my main concerns with Matrix is data retention, which is a key part of security in a threat model where (for example) an hostile state actor wants to surveil your communications and can seize your devices. On IRC, servers don't actually keep messages all that long: they pass them along to other servers and clients as fast as they can, only keep them in memory, and move on to the next message. There are no concerns about data retention on messages (and their metadata) other than the network layer. (I'm ignoring the issues with user registration, which is a separate, if valid, concern.) Obviously, an hostile server could log everything passing through it, but IRC federations are normally tightly controlled. So, if you trust your IRC operators, you should be fairly safe. Obviously, clients can (and often do, even if OTR is configured!) log all messages, but this is generally not the default. Irssi, for example, does not log by default. IRC bouncers are more likely to log to disk, of course, to be able to do what they do. Compare this to Matrix: when you send a message to a Matrix homeserver, that server first stores it in its internal SQL database. Then it will transmit that message to all clients connected to that server and room, and to all other servers that have clients connected to that room. Those remote servers, in turn, will keep a copy of that message and all its metadata in their own database, by default forever. On encrypted rooms those messages are encrypted, but not their metadata. There is a mechanism to expire entries in Synapse, but it is not enabled by default. So one should generally assume that a message sent on Matrix is never expired.

GDPR in the federation But even if that setting was enabled by default, how do you control it? This is a fundamental problem of the federation: if any user is allowed to join a room (which is the default), those user's servers will log all content and metadata from that room. That includes private, one-on-one conversations, since those are essentially rooms as well. In the context of the GDPR, this is really tricky: who is the responsible party (known as the "data controller") here? It's basically any yahoo who fires up a home server and joins a room. In a federated network, one has to wonder whether GDPR enforcement is even possible at all. But in Matrix in particular, if you want to enforce your right to be forgotten in a given room, you would have to:
  1. enumerate all the users that ever joined the room while you were there
  2. discover all their home servers
  3. start a GDPR procedure against all those servers
I recognize this is a hard problem to solve while still keeping an open ecosystem. But I believe that Matrix should have much stricter defaults towards data retention than right now. Message expiry should be enforced by default, for example. (Note that there are also redaction policies that could be used to implement part of the GDPR automatically, see the privacy policy discussion below on that.) Also keep in mind that, in the brave new peer-to-peer world that Matrix is heading towards, the boundary between server and client is likely to be fuzzier, which would make applying the GDPR even more difficult. Update: this comment links to this post (in german) which apparently studied the question and concluded that Matrix is not GDPR-compliant. In fact, maybe Synapse should be designed so that there's no configurable flag to turn off data retention. A bit like how most system loggers in UNIX (e.g. syslog) come with a log retention system that typically rotate logs after a few weeks or month. Historically, this was designed to keep hard drives from filling up, but it also has the added benefit of limiting the amount of personal information kept on disk in this modern day. (Arguably, syslog doesn't rotate logs on its own, but, say, Debian GNU/Linux, as an installed system, does have log retention policies well defined for installed packages, and those can be discussed. And "no expiry" is definitely a bug.

Matrix.org privacy policy When I first looked at Matrix, five years ago, Element.io was called Riot.im and had a rather dubious privacy policy:
We currently use cookies to support our use of Google Analytics on the Website and Service. Google Analytics collects information about how you use the Website and Service. [...] This helps us to provide you with a good experience when you browse our Website and use our Service and also allows us to improve our Website and our Service.
When I asked Matrix people about why they were using Google Analytics, they explained this was for development purposes and they were aiming for velocity at the time, not privacy (paraphrasing here). They also included a "free to snitch" clause:
If we are or believe that we are under a duty to disclose or share your personal data, we will do so in order to comply with any legal obligation, the instructions or requests of a governmental authority or regulator, including those outside of the UK.
Those are really broad terms, above and beyond what is typically expected legally. Like the current retention policies, such user tracking and ... "liberal" collaboration practices with the state set a bad precedent for other home servers. Thankfully, since the above policy was published (2017), the GDPR was "implemented" (2018) and it seems like both the Element.io privacy policy and the Matrix.org privacy policy have been somewhat improved since. Notable points of the new privacy policies:
  • 2.3.1.1: the "federation" section actually outlines that "Federated homeservers and Matrix clients which respect the Matrix protocol are expected to honour these controls and redaction/erasure requests, but other federated homeservers are outside of the span of control of Element, and we cannot guarantee how this data will be processed"
  • 2.6: users under the age of 16 should not use the matrix.org service
  • 2.10: Upcloud, Mythic Beast, Amazon, and CloudFlare possibly have access to your data (it's nice to at least mention this in the privacy policy: many providers don't even bother admitting to this kind of delegation)
  • Element 2.2.1: mentions many more third parties (Twilio, Stripe, Quaderno, LinkedIn, Twitter, Google, Outplay, PipeDrive, HubSpot, Posthog, Sentry, and Matomo (phew!) used when you are paying Matrix.org for hosting
I'm not super happy with all the trackers they have on the Element platform, but then again you don't have to use that service. Your favorite homeserver (assuming you are not on Matrix.org) probably has their own Element deployment, hopefully without all that garbage. Overall, this is all a huge improvement over the previous privacy policy, so hats off to the Matrix people for figuring out a reasonable policy in such a tricky context. I particularly like this bit:
We will forget your copy of your data upon your request. We will also forward your request to be forgotten onto federated homeservers. However - these homeservers are outside our span of control, so we cannot guarantee they will forget your data.
It's great they implemented those mechanisms and, after all, if there's an hostile party in there, nothing can prevent them from using screenshots to just exfiltrate your data away from the client side anyways, even with services typically seen as more secure, like Signal. As an aside, I also appreciate that Matrix.org has a fairly decent code of conduct, based on the TODO CoC which checks all the boxes in the geekfeminism wiki.

Metadata handling Overall, privacy protections in Matrix mostly concern message contents, not metadata. In other words, who's talking with who, when and from where is not well protected. Compared to a tool like Signal, which goes through great lengths to anonymize that data with features like private contact discovery, disappearing messages, sealed senders, and private groups, Matrix is definitely behind. (Note: there is an issue open about message lifetimes in Element since 2020, but it's not at even at the MSC stage yet.) This is a known issue (opened in 2019) in Synapse, but this is not just an implementation issue, it's a flaw in the protocol itself. Home servers keep join/leave of all rooms, which gives clear text information about who is talking to. Synapse logs may also contain privately identifiable information that home server admins might not be aware of in the first place. Those log rotation policies are separate from the server-level retention policy, which may be confusing for a novice sysadmin. Combine this with the federation: even if you trust your home server to do the right thing, the second you join a public room with third-party home servers, those ideas kind of get thrown out because those servers can do whatever they want with that information. Again, a problem that is hard to solve in any federation. To be fair, IRC doesn't have a great story here either: any client knows not only who's talking to who in a room, but also typically their client IP address. Servers can (and often do) obfuscate this, but often that obfuscation is trivial to reverse. Some servers do provide "cloaks" (sometimes automatically), but that's kind of a "slap-on" solution that actually moves the problem elsewhere: now the server knows a little more about the user. Overall, I would worry much more about a Matrix home server seizure than a IRC or Signal server seizure. Signal does get subpoenas, and they can only give out a tiny bit of information about their users: their phone number, and their registration, and last connection date. Matrix carries a lot more information in its database.

Amplification attacks on URL previews I (still!) run an Icecast server and sometimes share links to it on IRC which, obviously, also ends up on (more than one!) Matrix home servers because some people connect to IRC using Matrix. This, in turn, means that Matrix will connect to that URL to generate a link preview. I feel this outlines a security issue, especially because those sockets would be kept open seemingly forever. I tried to warn the Matrix security team but somehow, I don't think this issue was taken very seriously. Here's the disclosure timeline:
  • January 18: contacted Matrix security
  • January 19: response: already reported as a bug
  • January 20: response: can't reproduce
  • January 31: timeout added, considered solved
  • January 31: I respond that I believe the security issue is underestimated, ask for clearance to disclose
  • February 1: response: asking for two weeks delay after the next release (1.53.0) including another patch, presumably in two weeks' time
  • February 22: Matrix 1.53.0 released
  • April 14: I notice the release, ask for clearance again
  • April 14: response: referred to the public disclosure
There are a couple of problems here:
  1. the bug was publicly disclosed in September 2020, and not considered a security issue until I notified them, and even then, I had to insist
  2. no clear disclosure policy timeline was proposed or seems established in the project (there is a security disclosure policy but it doesn't include any predefined timeline)
  3. I wasn't informed of the disclosure
  4. the actual solution is a size limit (10MB, already implemented), a time limit (30 seconds, implemented in PR 11784), and a content type allow list (HTML, "media" or JSON, implemented in PR 11936), and I'm not sure it's adequate
  5. (pure vanity:) I did not make it to their Hall of fame
I'm not sure those solutions are adequate because they all seem to assume a single home server will pull that one URL for a little while then stop. But in a federated network, many (possibly thousands) home servers may be connected in a single room at once. If an attacker drops a link into such a room, all those servers would connect to that link all at once. This is an amplification attack: a small amount of traffic will generate a lot more traffic to a single target. It doesn't matter there are size or time limits: the amplification is what matters here. It should also be noted that clients that generate link previews have more amplification because they are more numerous than servers. And of course, the default Matrix client (Element) does generate link previews as well. That said, this is possibly not a problem specific to Matrix: any federated service that generates link previews may suffer from this. I'm honestly not sure what the solution is here. Maybe moderation? Maybe link previews are just evil? All I know is there was this weird bug in my Icecast server and I tried to ring the bell about it, and it feels it was swept under the rug. Somehow I feel this is bound to blow up again in the future, even with the current mitigation.

Moderation In Matrix like elsewhere, Moderation is a hard problem. There is a detailed moderation guide and much of this problem space is actively worked on in Matrix right now. A fundamental problem with moderating a federated space is that a user banned from a room can rejoin the room from another server. This is why spam is such a problem in Email, and why IRC networks have stopped federating ages ago (see the IRC history for that fascinating story).

The mjolnir bot The mjolnir moderation bot is designed to help with some of those things. It can kick and ban users, redact all of a user's message (as opposed to one by one), all of this across multiple rooms. It can also subscribe to a federated block list published by matrix.org to block known abusers (users or servers). Bans are pretty flexible and can operate at the user, room, or server level. Matrix people suggest making the bot admin of your channels, because you can't take back admin from a user once given.

The command-line tool There's also a new command line tool designed to do things like:
  • System notify users (all users/users from a list, specific user)
  • delete sessions/devices not seen for X days
  • purge the remote media cache
  • select rooms with various criteria (external/local/empty/created by/encrypted/cleartext)
  • purge history of theses rooms
  • shutdown rooms
This tool and Mjolnir are based on the admin API built into Synapse.

Rate limiting Synapse has pretty good built-in rate-limiting which blocks repeated login, registration, joining, or messaging attempts. It may also end up throttling servers on the federation based on those settings.

Fundamental federation problems Because users joining a room may come from another server, room moderators are at the mercy of the registration and moderation policies of those servers. Matrix is like IRC's +R mode ("only registered users can join") by default, except that anyone can register their own homeserver, which makes this limited. Server admins can block IP addresses and home servers, but those tools are not easily available to room admins. There is an API (m.room.server_acl in /devtools) but it is not reliable (thanks Austin Huang for the clarification). Matrix has the concept of guest accounts, but it is not used very much, and virtually no client or homeserver supports it. This contrasts with the way IRC works: by default, anyone can join an IRC network even without authentication. Some channels require registration, but in general you are free to join and look around (until you get blocked, of course). I have seen anecdotal evidence (CW: Twitter, nitter link) that "moderating bridges is hell", and I can imagine why. Moderation is already hard enough on one federation, when you bridge a room with another network, you inherit all the problems from that network but without the entire abuse control tools from the original network's API...

Room admins Matrix, in particular, has the problem that room administrators (which have the power to redact messages, ban users, and promote other users) are bound to their Matrix ID which is, in turn, bound to their home servers. This implies that a home server administrators could (1) impersonate a given user and (2) use that to hijack the room. So in practice, the home server is the trust anchor for rooms, not the user themselves. That said, if server B administrator hijack user joe on server B, they will hijack that room on that specific server. This will not (necessarily) affect users on the other servers, as servers could refuse parts of the updates or ban the compromised account (or server). It does seem like a major flaw that room credentials are bound to Matrix identifiers, as opposed to the E2E encryption credentials. In an encrypted room even with fully verified members, a compromised or hostile home server can still take over the room by impersonating an admin. That admin (or even a newly minted user) can then send events or listen on the conversations. This is even more frustrating when you consider that Matrix events are actually signed and therefore have some authentication attached to them, acting like some sort of Merkle tree (as it contains a link to previous events). That signature, however, is made from the homeserver PKI keys, not the client's E2E keys, which makes E2E feel like it has been "bolted on" later.

Availability While Matrix has a strong advantage over Signal in that it's decentralized (so anyone can run their own homeserver,), I couldn't find an easy way to run a "multi-primary" setup, or even a "redundant" setup (even if with a single primary backend), short of going full-on "replicate PostgreSQL and Redis data", which is not typically for the faint of heart.

How this works in IRC On IRC, it's quite easy to setup redundant nodes. All you need is:
  1. a new machine (with it's own public address with an open port)
  2. a shared secret (or certificate) between that machine and an existing one on the network
  3. a connect block on both servers
That's it: the node will join the network and people can connect to it as usual and share the same user/namespace as the rest of the network. The servers take care of synchronizing state: you do not need to worry about replicating a database server. (Now, experienced IRC people will know there's a catch here: IRC doesn't have authentication built in, and relies on "services" which are basically bots that authenticate users (I'm simplifying, don't nitpick). If that service goes down, the network still works, but then people can't authenticate, and they can start doing nasty things like steal people's identity if they get knocked offline. But still: basic functionality still works: you can talk in rooms and with users that are on the reachable network.)

User identities Matrix is more complicated. Each "home server" has its own identity namespace: a specific user (say @anarcat:matrix.org) is bound to that specific home server. If that server goes down, that user is completely disconnected. They could register a new account elsewhere and reconnect, but then they basically lose all their configuration: contacts, joined channels are all lost. (Also notice how the Matrix IDs don't look like a typical user address like an email in XMPP. They at least did their homework and got the allocation for the scheme.)

Rooms Users talk to each other in "rooms", even in one-to-one communications. (Rooms are also used for other things like "spaces", they're basically used for everything, think "everything is a file" kind of tool.) For rooms, home servers act more like IRC nodes in that they keep a local state of the chat room and synchronize it with other servers. Users can keep talking inside a room if the server that originally hosts the room goes down. Rooms can have a local, server-specific "alias" so that, say, #room:matrix.org is also visible as #room:example.com on the example.com home server. Both addresses refer to the same room underlying room. (Finding this in the Element settings is not obvious though, because that "alias" are actually called a "local address" there. So to create such an alias (in Element), you need to go in the room settings' "General" section, "Show more" in "Local address", then add the alias name (e.g. foo), and then that room will be available on your example.com homeserver as #foo:example.com.) So a room doesn't belong to a server, it belongs to the federation, and anyone can join the room from any serer (if the room is public, or if invited otherwise). You can create a room on server A and when a user from server B joins, the room will be replicated on server B as well. If server A fails, server B will keep relaying traffic to connected users and servers. A room is therefore not fundamentally addressed with the above alias, instead ,it has a internal Matrix ID, which basically a random string. It has a server name attached to it, but that was made just to avoid collisions. That can get a little confusing. For example, the #fractal:gnome.org room is an alias on the gnome.org server, but the room ID is !hwiGbsdSTZIwSRfybq:matrix.org. That's because the room was created on matrix.org, but the preferred branding is gnome.org now. As an aside, rooms, by default, live forever, even after the last user quits. There's an admin API to delete rooms and a tombstone event to redirect to another one, but neither have a GUI yet. The latter is part of MSC1501 ("Room version upgrades") which allows a room admin to close a room, with a message and a pointer to another room.

Spaces Discovering rooms can be tricky: there is a per-server room directory, but Matrix.org people are trying to deprecate it in favor of "Spaces". Room directories were ripe for abuse: anyone can create a room, so anyone can show up in there. It's possible to restrict who can add aliases, but anyways directories were seen as too limited. In contrast, a "Space" is basically a room that's an index of other rooms (including other spaces), so existing moderation and administration mechanism that work in rooms can (somewhat) work in spaces as well. This enables a room directory that works across federation, regardless on which server they were originally created. New users can be added to a space or room automatically in Synapse. (Existing users can be told about the space with a server notice.) This gives admins a way to pre-populate a list of rooms on a server, which is useful to build clusters of related home servers, providing some sort of redundancy, at the room -- not user -- level.

Home servers So while you can workaround a home server going down at the room level, there's no such thing at the home server level, for user identities. So if you want those identities to be stable in the long term, you need to think about high availability. One limitation is that the domain name (e.g. matrix.example.com) must never change in the future, as renaming home servers is not supported. The documentation used to say you could "run a hot spare" but that has been removed. Last I heard, it was not possible to run a high-availability setup where multiple, separate locations could replace each other automatically. You can have high performance setups where the load gets distributed among workers, but those are based on a shared database (Redis and PostgreSQL) backend. So my guess is it would be possible to create a "warm" spare server of a matrix home server with regular PostgreSQL replication, but that is not documented in the Synapse manual. This sort of setup would also not be useful to deal with networking issues or denial of service attacks, as you will not be able to spread the load over multiple network locations easily. Redis and PostgreSQL heroes are welcome to provide their multi-primary solution in the comments. In the meantime, I'll just point out this is a solution that's handled somewhat more gracefully in IRC, by having the possibility of delegating the authentication layer.

Delegations If you do not want to run a Matrix server yourself, it's possible to delegate the entire thing to another server. There's a server discovery API which uses the .well-known pattern (or SRV records, but that's "not recommended" and a bit confusing) to delegate that service to another server. Be warned that the server still needs to be explicitly configured for your domain. You can't just put:
  "m.server": "matrix.org:443"  
... on https://example.com/.well-known/matrix/server and start using @you:example.com as a Matrix ID. That's because Matrix doesn't support "virtual hosting" and you'd still be connecting to rooms and people with your matrix.org identity, not example.com as you would normally expect. This is also why you cannot rename your home server. The server discovery API is what allows servers to find each other. Clients, on the other hand, use the client-server discovery API: this is what allows a given client to find your home server when you type your Matrix ID on login.

Performance The high availability discussion brushed over the performance of Matrix itself, but let's now dig into that.

Horizontal scalability There were serious scalability issues of the main Matrix server, Synapse, in the past. So the Matrix team has been working hard to improve its design. Since Synapse 1.22 the home server can horizontally scale to multiple workers (see this blog post for details) which can make it easier to scale large servers.

Other implementations There are other promising home servers implementations from a performance standpoint (dendrite, Golang, entered beta in late 2020; conduit, Rust, beta; others), but none of those are feature-complete so there's a trade-off to be made there. Synapse is also adding a lot of feature fast, so it's an open question whether the others will ever catch up. (I have heard that Dendrite might actually surpass Synapse in features within a few years, which would put Synapse in a more "LTS" situation.)

Latency Matrix can feel slow sometimes. For example, joining the "Matrix HQ" room in Element (from matrix.debian.social) takes a few minutes and then fails. That is because the home server has to sync the entire room state when you join the room. There was promising work on this announced in the lengthy 2021 retrospective, and some of that work landed (partial sync) in the 1.53 release already. Other improvements coming include sliding sync, lazy loading over federation, and fast room joins. So that's actually something that could be fixed in the fairly short term. But in general, communication in Matrix doesn't feel as "snappy" as on IRC or even Signal. It's hard to quantify this without instrumenting a full latency test bed (for example the tools I used in the terminal emulators latency tests), but even just typing in a web browser feels slower than typing in a xterm or Emacs for me. Even in conversations, I "feel" people don't immediately respond as fast. In fact, this could be an interesting double-blind experiment to make: have people guess whether they are talking to a person on Matrix, XMPP, or IRC, for example. My theory would be that people could notice that Matrix users are slower, if only because of the TCP round-trip time each message has to take.

Transport Some courageous person actually made some tests of various messaging platforms on a congested network. His evaluation was basically:
  • Briar: uses Tor, so unusable except locally
  • Matrix: "struggled to send and receive messages", joining a room takes forever as it has to sync all history, "took 20-30 seconds for my messages to be sent and another 20 seconds for further responses"
  • XMPP: "worked in real-time, full encryption, with nearly zero lag"
So that was interesting. I suspect IRC would have also fared better, but that's just a feeling. Other improvements to the transport layer include support for websocket and the CoAP proxy work from 2019 (targeting 100bps links), but both seem stalled at the time of writing. The Matrix people have also announced the pinecone p2p overlay network which aims at solving large, internet-scale routing problems. See also this talk at FOSDEM 2022.

Usability

Onboarding and workflow The workflow for joining a room, when you use Element web, is not great:
  1. click on a link in a web browser
  2. land on (say) https://matrix.to/#/#matrix-dev:matrix.org
  3. offers "Element", yeah that's sounds great, let's click "Continue"
  4. land on https://app.element.io/#/room%2F%23matrix-dev%3Amatrix.org and then you need to register, aaargh
As you might have guessed by now, there is a specification to solve this, but web browsers need to adopt it as well, so that's far from actually being solved. At least browsers generally know about the matrix: scheme, it's just not exactly clear what they should do with it, especially when the handler is just another web page (e.g. Element web). In general, when compared with tools like Signal or WhatsApp, Matrix doesn't fare so well in terms of user discovery. I probably have some of my normal contacts that have a Matrix account as well, but there's really no way to know. It's kind of creepy when Signal tells you "this person is on Signal!" but it's also pretty cool that it works, and they actually implemented it pretty well. Registration is also less obvious: in Signal, the app confirms your phone number automatically. It's friction-less and quick. In Matrix, you need to learn about home servers, pick one, register (with a password! aargh!), and then setup encryption keys (not default), etc. It's a lot more friction. And look, I understand: giving away your phone number is a huge trade-off. I don't like it either. But it solves a real problem and makes encryption accessible to a ton more people. Matrix does have "identity servers" that can serve that purpose, but I don't feel confident sharing my phone number there. It doesn't help that the identity servers don't have private contact discovery: giving them your phone number is a more serious security compromise than with Signal. There's a catch-22 here too: because no one feels like giving away their phone numbers, no one does, and everyone assumes that stuff doesn't work anyways. Like it or not, Signal forcing people to divulge their phone number actually gives them critical mass that means actually a lot of my relatives are on Signal and I don't have to install crap like WhatsApp to talk with them.

5 minute clients evaluation Throughout all my tests I evaluated a handful of Matrix clients, mostly from Flathub because almost none of them are packaged in Debian. Right now I'm using Element, the flagship client from Matrix.org, in a web browser window, with the PopUp Window extension. This makes it look almost like a native app, and opens links in my main browser window (instead of a new tab in that separate window), which is nice. But I'm tired of buying memory to feed my web browser, so this indirection has to stop. Furthermore, I'm often getting completely logged off from Element, which means re-logging in, recovering my security keys, and reconfiguring my settings. That is extremely annoying. Coming from Irssi, Element is really "GUI-y" (pronounced "gooey"). Lots of clickety happening. To mark conversations as read, in particular, I need to click-click-click on all the tabs that have some activity. There's no "jump to latest message" or "mark all as read" functionality as far as I could tell. In Irssi the former is built-in (alt-a) and I made a custom /READ command for the latter:
/ALIAS READ script exec \$_->activity(0) for Irssi::windows
And yes, that's a Perl script in my IRC client. I am not aware of any Matrix client that does stuff like that, except maybe Weechat, if we can call it a Matrix client, or Irssi itself, now that it has a Matrix plugin (!). As for other clients, I have looked through the Matrix Client Matrix (confusing right?) to try to figure out which one to try, and, even after selecting Linux as a filter, the chart is just too wide to figure out anything. So I tried those, kind of randomly:
  • Fractal
  • Mirage
  • Nheko
  • Quaternion
Unfortunately, I lost my notes on those, I don't actually remember which one did what. I still have a session open with Mirage, so I guess that means it's the one I preferred, but I remember they were also all very GUI-y. Maybe I need to look at weechat-matrix or gomuks. At least Weechat is scriptable so I could continue playing the power-user. Right now my strategy with messaging (and that includes microblogging like Twitter or Mastodon) is that everything goes through my IRC client, so Weechat could actually fit well in there. Going with gomuks, on the other hand, would mean running it in parallel with Irssi or ... ditching IRC, which is a leap I'm not quite ready to take just yet. Oh, and basically none of those clients (except Nheko and Element) support VoIP, which is still kind of a second-class citizen in Matrix. It does not support large multimedia rooms, for example: Jitsi was used for FOSDEM instead of the native videoconferencing system.

Bots This falls a little aside the "usability" section, but I didn't know where to put this... There's a few Matrix bots out there, and you are likely going to be able to replace your existing bots with Matrix bots. It's true that IRC has a long and impressive history with lots of various bots doing various things, but given how young Matrix is, there's still a good variety:
  • maubot: generic bot with tons of usual plugins like sed, dice, karma, xkcd, echo, rss, reminder, translate, react, exec, gitlab/github webhook receivers, weather, etc
  • opsdroid: framework to implement "chat ops" in Matrix, connects with Matrix, GitHub, GitLab, Shell commands, Slack, etc
  • matrix-nio: another framework, used to build lots more bots like:
    • hemppa: generic bot with various functionality like weather, RSS feeds, calendars, cron jobs, OpenStreetmaps lookups, URL title snarfing, wolfram alpha, astronomy pic of the day, Mastodon bridge, room bridging, oh dear
    • devops: ping, curl, etc
    • podbot: play podcast episodes from AntennaPod
    • cody: Python, Ruby, Javascript REPL
    • eno: generic bot, "personal assistant"
  • mjolnir: moderation bot
  • hookshot: bridge with GitLab/GitHub
  • matrix-monitor-bot: latency monitor
One thing I haven't found an equivalent for is Debian's MeetBot. There's an archive bot but it doesn't have topics or a meeting chair, or HTML logs.

Working on Matrix As a developer, I find Matrix kind of intimidating. The specification is huge. The official specification itself looks somewhat digestable: it's only 6 APIs so that looks, at first, kind of reasonable. But whenever you start asking complicated questions about Matrix, you quickly fall into the Matrix Spec Change specification (which, yes, is a separate specification). And there are literally hundreds of MSCs flying around. It's hard to tell what's been adopted and what hasn't, and even harder to figure out if your specific client has implemented it. (One trendy answer to this problem is to "rewrite it in rust": Matrix are working on implementing a lot of those specifications in a matrix-rust-sdk that's designed to take the implementation details away from users.) Just taking the latest weekly Matrix report, you find that three new MSCs proposed, just last week! There's even a graph that shows the number of MSCs is progressing steadily, at 600+ proposals total, with the majority (300+) "new". I would guess the "merged" ones are at about 150. That's a lot of text which includes stuff like 3D worlds which, frankly, I don't think you should be working on when you have such important security and usability problems. (The internet as a whole, arguably, doesn't fare much better. RFC600 is a really obscure discussion about "INTERFACING AN ILLINOIS PLASMA TERMINAL TO THE ARPANET". Maybe that's how many MSCs will end up as well, left forgotten in the pits of history.) And that's the thing: maybe the Matrix people have a different objective than I have. They want to connect everything to everything, and make Matrix a generic transport for all sorts of applications, including virtual reality, collaborative editors, and so on. I just want secure, simple messaging. Possibly with good file transfers, and video calls. That it works with existing stuff is good, and it should be federated to remove the "Signal point of failure". So I'm a bit worried with the direction all those MSCs are taking, especially when you consider that clients other than Element are still struggling to keep up with basic features like end-to-end encryption or room discovery, never mind voice or spaces...

Conclusion Overall, Matrix is somehow in the space XMPP was a few years ago. It has a ton of features, pretty good clients, and a large community. It seems to have gained some of the momentum that XMPP has lost. It may have the most potential to replace Signal if something bad would happen to it (like, I don't know, getting banned or going nuts with cryptocurrency)... But it's really not there yet, and I don't see Matrix trying to get there either, which is a bit worrisome.

Looking back at history I'm also worried that we are repeating the errors of the past. The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet, and are all still around (except, arguably, FTP, which was removed from major browsers recently). All of them had to face serious challenges in growing their federation. IRC had numerous conflicts and forks, both at the technical level but also at the political level. The history of IRC is really something that anyone working on a federated system should study in detail, because they are bound to make the same mistakes if they are not familiar with it. The "short" version is:
  • 1988: Finnish researcher publishes first IRC source code
  • 1989: 40 servers worldwide, mostly universities
  • 1990: EFnet ("eris-free network") fork which blocks the "open relay", named Eris - followers of Eris form the A-net, which promptly dissolves itself, with only EFnet remaining
  • 1992: Undernet fork, which offered authentication ("services"), routing improvements and timestamp-based channel synchronisation
  • 1994: DALnet fork, from Undernet, again on a technical disagreement
  • 1995: Freenode founded
  • 1996: IRCnet forks from EFnet, following a flame war of historical proportion, splitting the network between Europe and the Americas
  • 1997: Quakenet founded
  • 1999: (XMPP founded)
  • 2001: 6 million users, OFTC founded
  • 2002: DALnet peaks at 136,000 users
  • 2003: IRC as a whole peaks at 10 million users, EFnet peaks at 141,000 users
  • 2004: (Facebook founded), Undernet peaks at 159,000 users
  • 2005: Quakenet peaks at 242,000 users, IRCnet peaks at 136,000 (Youtube founded)
  • 2006: (Twitter founded)
  • 2009: (WhatsApp, Pinterest founded)
  • 2010: (TextSecure AKA Signal, Instagram founded)
  • 2011: (Snapchat founded)
  • ~2013: Freenode peaks at ~100,000 users
  • 2016: IRCv3 standardisation effort started (TikTok founded)
  • 2021: Freenode self-destructs, Libera chat founded
  • 2022: Libera peaks at 50,000 users, OFTC peaks at 30,000 users
(The numbers were taken from the Wikipedia page and Netsplit.de. Note that I also include other networks launch in parenthesis for context.) Pretty dramatic, don't you think? Eventually, somehow, IRC became irrelevant for most people: few people are even aware of it now. With less than a million users active, it's smaller than Mastodon, XMPP, or Matrix at this point.1 If I were to venture a guess, I'd say that infighting, lack of a standardization body, and a somewhat annoying protocol meant the network could not grow. It's also possible that the decentralised yet centralised structure of IRC networks limited their reliability and growth. But large social media companies have also taken over the space: observe how IRC numbers peak around the time the wave of large social media companies emerge, especially Facebook (2.9B users!!) and Twitter (400M users).

Where the federated services are in history Right now, Matrix, and Mastodon (and email!) are at the "pre-EFnet" stage: anyone can join the federation. Mastodon has started working on a global block list of fascist servers which is interesting, but it's still an open federation. Right now, Matrix is totally open, but matrix.org publishes a (federated) block list of hostile servers (#matrix-org-coc-bl:matrix.org, yes, of course it's a room). Interestingly, Email is also in that stage, where there are block lists of spammers, and it's a race between those blockers and spammers. Large email providers, obviously, are getting closer to the EFnet stage: you could consider they only accept email from themselves or between themselves. It's getting increasingly hard to deliver mail to Outlook and Gmail for example, partly because of bias against small providers, but also because they are including more and more machine-learning tools to sort through email and those systems are, fundamentally, unknowable. It's not quite the same as splitting the federation the way EFnet did, but the effect is similar. HTTP has somehow managed to live in a parallel universe, as it's technically still completely federated: anyone can start a web server if they have a public IP address and anyone can connect to it. The catch, of course, is how you find the darn thing. Which is how Google became one of the most powerful corporations on earth, and how they became the gatekeepers of human knowledge online. I have only briefly mentioned XMPP here, and my XMPP fans will undoubtedly comment on that, but I think it's somewhere in the middle of all of this. It was co-opted by Facebook and Google, and both corporations have abandoned it to its fate. I remember fondly the days where I could do instant messaging with my contacts who had a Gmail account. Those days are gone, and I don't talk to anyone over Jabber anymore, unfortunately. And this is a threat that Matrix still has to face. It's also the threat Email is currently facing. On the one hand corporations like Facebook want to completely destroy it and have mostly succeeded: many people just have an email account to register on things and talk to their friends over Instagram or (lately) TikTok (which, I know, is not Facebook, but they started that fire). On the other hand, you have corporations like Microsoft and Google who are still using and providing email services because, frankly, you still do need email for stuff, just like fax is still around but they are more and more isolated in their own silo. At this point, it's only a matter of time they reach critical mass and just decide that the risk of allowing external mail coming in is not worth the cost. They'll simply flip the switch and work on an allow-list principle. Then we'll have closed the loop and email will be dead, just like IRC is "dead" now. I wonder which path Matrix will take. Could it liberate us from these vicious cycles? Update: this generated some discussions on lobste.rs.

  1. According to Wikipedia, there are currently about 500 distinct IRC networks operating, on about 1,000 servers, serving over 250,000 users. In contrast, Mastodon seems to be around 5 million users, Matrix.org claimed at FOSDEM 2021 to have about 28 million globally visible accounts, and Signal lays claim to over 40 million souls. XMPP claims to have "millions" of users on the xmpp.org homepage but the FAQ says they don't actually know. On the proprietary silo side of the fence, this page says
    • Facebook: 2.9 billion users
    • WhatsApp: 2B
    • Instagram: 1.4B
    • TikTok: 1B
    • Snapchat: 500M
    • Pinterest: 480M
    • Twitter: 397M
    Notable omission from that list: Youtube, with its mind-boggling 2.6 billion users... Those are not the kind of numbers you just "need to convince a brother or sister" to grow the network...

14 April 2022

Jonathan Dowland: hledger

This year I've decided to bite the bullet and properly try out hledger for personal accounting. It seems I need to commit to it properly if I'm to figure out whether it will work for me or not. Up until now I'd been strictly separating my finances into two buckets: family and personal. I'd been using GNUCash for a couple of years for my personal finances, initially to evaluate it for use for the family, but I had not managed to adopt it for that. I set up a new git repository to track the ledger file, as well as a notes.txt "diary" that documents my planning around its structure and how to use it, and a import.txt which documents what account data I have imported and confirmed that the resulting balances match those reported on monthly statements. For this evaluation, I decided to bite the bullet and track both family and personal finances at the same time. I'm still keeping them conceptually very separate. To reflect that I've organised my account names around that: all accounts relating to family are prefixed family:, and likewise personal jon:.1 Some example accounts:
family:assets:shared    - shared bank account
family:dues:jon         - I owe to family
family:expenses:cat     - budget category for the cat
income                  - where money enters this universe
jon:assets:current      - my personal account
jon:dues:peter          - money Peter owes me
jon:expenses:snacks     - budget category for coffees etc
jon:liabilities:amex    - a personal credit card
I decided to make the calendar year a strict cut-over point: my personal opening balances in hledger are determined by what GNUCash reports. It's possible those will change over this year, as adjustments are made to last year's data: but it's easy enough to go in and update the opening balances in hledger to reflect that. Credit cards are a small exception. January's credit card bills are paid in January but cover transactions from mid-December. I import those transactions into hledger to balance the credit card payment. As a consequence, the "spend per month" view of my data is a bit skewed: All the transactions in December should be thought of as in January since that's when they were paid. I need to explore options to fix this. When I had family and personal managed separately, occasionally something would be paid for on the wrong card and end up in the wrong data. The solution I used last year was to keep an account dues:family to which I posted those and periodically I'd settle it with a real-world bank transfer. I've realised that this doesn't work so well when I manage both together: I can't track both dues and expense categorisation with just one posting. The solution I'm using for now is hledger's unbalanced virtual postings: a third posting for the transaction to the budget category, which is not balanced, e.g.:
2022-01-02 ZTL*RELISH
    family:liabilities:creditcard        -3.00
    family:dues:jon                       3.00
    (jon:expenses:snacks)                 3.00
This works, but it completely side-steps double-entry book keeping, which is the whole point of using a double-entry system. There's also no check and balance that the figure I put in the virtual posting ( 3) matches the figure in the rest of the transaction. I'm therefore open to other ideas.

  1. there are a couple of places in hledger where default account names are used, such as the default place that expenses are posted to during CSV imports: expenses:unknown, that obviously don't fit my family/jon: prefix scheme. The solution is to make sure I specify a default posting-to account in all my CSV import rules.

Next.

Previous.